Can Blockchain Store Large Files?

Blockchains can technically store data, but the architecture makes large file storage prohibitively expensive. The correct pattern is an on-chain reference pointing to off-chain storage — and here's why that's not a workaround.
Lewis Jackson
CEO and Founder

The short answer is: technically yes, practically no — and the "practically no" matters more than most explanations let on.

Blockchains can store raw data. Ethereum has a calldata field in transactions where arbitrary bytes can be included. Bitcoin can embed small amounts of data using OP_RETURN outputs. The blockchain will record it, and every node on the network will store it — forever. But "can" and "should" are different questions here, and the economics make large file storage effectively impossible at any real scale.

Why Blockchains Aren't Designed for Storage

The blockchain's job is to maintain an agreed-upon record of state changes: who owns what, which transactions occurred, what smart contract code exists. It accomplishes this by having every participant in the network store a complete copy of the ledger.

That replication is what makes blockchains secure and trustless. It's also what makes large file storage structurally absurd.

Every full node on the Ethereum network stores the entire Ethereum state history. That's currently over one terabyte, and that's mostly lightweight data — smart contract interactions, token transfers, balance changes. If developers started routinely using Ethereum to store images or documents, that figure would compound in ways the network isn't designed to absorb.

The cost mechanism reflects this directly. On Ethereum, every byte of calldata costs gas — roughly 16 gas per non-zero byte under current specs. Storing a single 1MB image on-chain would cost somewhere between $50 and several thousand dollars depending on network conditions. A 100MB file would cost more than most people's monthly salaries. A feature film would be economically impossible at any gas price.

Bitcoin is even more constrained. The OP_RETURN output allows embedding up to 80 bytes. Eighty bytes. That's a tweet, not a file.

What Actually Gets Stored On-Chain

What blockchains store well is references, not files.

The standard pattern: upload a file to a decentralized storage network — IPFS or Arweave, for example — then commit a cryptographic hash of that file to the blockchain. The hash is short (32 to 64 bytes typically), and it functions as a tamper-proof pointer. If the file is ever changed, the hash changes, and the mismatch is immediately detectable. The chain doesn't hold the file; it holds the proof that a specific file existed at a specific time.

This is how most NFTs actually work, which surprises people who assume the blockchain holds the image. Usually it doesn't. What the NFT token record contains is a metadata URI — often an IPFS content identifier, or CID — that points to a JSON file describing the asset. That JSON file in turn points to the actual image hosted on IPFS or another storage service.

The blockchain guarantees who owns the token. It doesn't guarantee the image still exists. That's a genuinely important distinction. Several NFT collections have lost access to their imagery because the images were hosted on centralized servers that went offline. The token persists on-chain; the asset it referenced is gone.

Arweave addresses this more directly than IPFS by building permanent storage incentives into the protocol — but even that's a separate system sitting alongside the blockchain, not embedded in it.

Where the Constraints Actually Live

There are three binding constraints worth understanding separately.

Block size limits come first. Every blockchain imposes caps on how much data fits in a single block. Ethereum targets around 1.5MB of data per block (with dynamic adjustment around a target). Bitcoin targets 1MB. These caps exist to ensure that nodes running on reasonable hardware can keep up with block propagation across the network. Larger blocks slow propagation and raise the minimum cost of running a full node — both of which tend toward centralization.

Node replication is the second constraint. Because all full nodes store all data, storage costs compound across the entire network. One node storing a 1GB file is trivial. Ten thousand nodes each storing that same 1GB is ten petabytes of aggregate storage committed permanently to a single file. The economics don't scale.

Fee markets are the third. Gas fees on Ethereum represent competition for block space. Large data submissions compete with smart contract interactions for that space, and developers who need storage don't want to pay fees priced for financial settlement. The market naturally routes data off-chain.

What's Changing

EIP-4844 — proto-danksharding — activated in March 2024 and introduced "blobs": temporary data packets that rollups use for their transaction data. Blobs are cheaper than permanent calldata because they're pruned after approximately two weeks. They're a meaningful cost reduction for Layer 2 networks posting transaction batches to Ethereum.

But blobs don't solve the large-file problem. They're temporary by design, still size-limited, and purpose-built for rollup batch data, not general storage.

The more relevant development is the maturing of dedicated decentralized storage networks that sit alongside blockchains: Arweave for permanent storage with one-time payment, IPFS for content-addressed storage (with the caveat that someone needs to pin your files), and Filecoin for incentivized storage via an active marketplace. These systems are increasingly the correct infrastructure layer for any application needing to store more than a few hundred bytes.

The hybrid pattern — content hash on-chain, file off-chain — is well-established and increasingly standardized. It isn't a workaround or a compromise. It's the right architecture.

What Would Confirm the Pattern Holds

If Layer 1 block size limits remain stable or shrink — which is the likely direction, given ongoing concern about state growth and the minimum cost of running a node — this constraint strengthens. Ethereum's roadmap (including state expiry and statelessness research) explicitly frames large on-chain data as a problem to manage, not a feature to expand.

Continued adoption of hybrid storage patterns, combined with growing decentralized storage network usage, would confirm the direction.

What Would Change the Picture

A blockchain that redesigned itself to treat storage as a first-class primitive — with different node tiers, storage-specific sharding, and pricing models calibrated to real storage economics — could behave differently. Arweave is the clearest example: it uses a blockchain-like structure but optimizes entirely for permanent cheap storage rather than transaction processing. It's a different design from the ground up.

At the margin, a network willing to tolerate significantly higher hardware requirements for node operators could allow larger blocks and cheaper on-chain data. That's a tradeoff against decentralization, and the major networks have generally not been willing to make it.

Timing Perspective

This constraint isn't going away for mainstream chains in any near-term window. If you're building something that needs to store documents, images, or larger data, the correct pattern today is: use decentralized storage and anchor the content hash on-chain. That's not a stopgap — it's the established engineering practice.

The question of which decentralized storage network to use is a separate evaluation with real tradeoffs around cost, permanence, and censorship resistance. But the architectural separation between storage and settlement is stable.

Boundary Statement

This post explains why blockchains aren't suited for large file storage and how the hybrid on-chain reference pattern works. It doesn't evaluate specific decentralized storage networks against each other, address every chain's data tolerance, or assess whether on-chain data storage is appropriate for any particular use case. Those require specific context.

The mechanism described — replicated state, fee markets, block size limits — applies to Bitcoin, Ethereum, and most networks that prioritize decentralization. Some specialized chains have made different tradeoffs.

Related Posts

See All
Crypto Research
New XRP-Focused Research Defining the “Velocity Threshold” for Global Settlement and Liquidity
A lot of people looking at my recent research have asked the same question: “Surely Ripple already understands all of this. So what does that mean for XRP?” That question is completely valid — and it turns out it’s the right question to ask. This research breaks down why XRP is unlikely to be the internal settlement asset of CBDC shared ledgers or unified bank platforms, and why that doesn’t mean XRP is irrelevant. Instead, it explains where XRP realistically fits in the system banks are actually building: at the seams, where different rulebooks, platforms, and networks still need to connect. Using liquidity math, system design, and real-world settlement mechanics, this piece explains: why most value settles inside venues, not through bridges why XRP’s role is narrower but more precise than most narratives suggest how velocity (refresh interval) determines whether XRP creates scarcity or just throughput and why Ripple’s strategy makes more sense once you stop assuming XRP must be “the core of everything” This isn’t a bullish or bearish take — it’s a structural one. If you want to understand XRP beyond hype and price targets, this is the question you need to grapple with.
Read Now
Crypto Research
The Jackson Liquidity Framework - Announcement
Lewis Jackson Ventures announces the release of the Jackson Liquidity Framework — the first quantitative, regulator-aligned model for liquidity sizing in AMM-based settlement systems, CBDC corridors, and tokenised financial infrastructures. Developed using advanced stochastic simulations and grounded in Basel III and PFMI principles, the framework provides a missing methodology for determining how much liquidity prefunded AMM pools actually require under real-world flow conditions.
Read Now
Crypto Research
Banks, Stablecoins, and Tokenized Assets
In Episode 011 of The Macro, crypto analyst Lewis Jackson unpacks a pivotal week in global finance — one marked by record growth in tokenized assets, expanding stablecoin adoption across emerging markets, and major institutions deepening their blockchain commitments. This research brief summarises Jackson’s key findings, from tokenized deposits to institutional RWA chains and AI-driven compliance, and explains how these developments signal a maturing, multi-rail settlement architecture spanning Ethereum, XRPL, stablecoin networks, and new interoperability layers.Taken together, this episode marks a structural shift toward programmable finance, instant settlement, and tokenized real-world assets at global scale.
Read Now

Related Posts

See All
No items found.
Lewsletter

Weekly notes on what I’m seeing

A personal letter I send straight to your inbox —reflections on crypto, wealth, time and life.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.