The short answer is: technically yes, practically no — and the "practically no" matters more than most explanations let on.
Blockchains can store raw data. Ethereum has a calldata field in transactions where arbitrary bytes can be included. Bitcoin can embed small amounts of data using OP_RETURN outputs. The blockchain will record it, and every node on the network will store it — forever. But "can" and "should" are different questions here, and the economics make large file storage effectively impossible at any real scale.
The blockchain's job is to maintain an agreed-upon record of state changes: who owns what, which transactions occurred, what smart contract code exists. It accomplishes this by having every participant in the network store a complete copy of the ledger.
That replication is what makes blockchains secure and trustless. It's also what makes large file storage structurally absurd.
Every full node on the Ethereum network stores the entire Ethereum state history. That's currently over one terabyte, and that's mostly lightweight data — smart contract interactions, token transfers, balance changes. If developers started routinely using Ethereum to store images or documents, that figure would compound in ways the network isn't designed to absorb.
The cost mechanism reflects this directly. On Ethereum, every byte of calldata costs gas — roughly 16 gas per non-zero byte under current specs. Storing a single 1MB image on-chain would cost somewhere between $50 and several thousand dollars depending on network conditions. A 100MB file would cost more than most people's monthly salaries. A feature film would be economically impossible at any gas price.
Bitcoin is even more constrained. The OP_RETURN output allows embedding up to 80 bytes. Eighty bytes. That's a tweet, not a file.
What blockchains store well is references, not files.
The standard pattern: upload a file to a decentralized storage network — IPFS or Arweave, for example — then commit a cryptographic hash of that file to the blockchain. The hash is short (32 to 64 bytes typically), and it functions as a tamper-proof pointer. If the file is ever changed, the hash changes, and the mismatch is immediately detectable. The chain doesn't hold the file; it holds the proof that a specific file existed at a specific time.
This is how most NFTs actually work, which surprises people who assume the blockchain holds the image. Usually it doesn't. What the NFT token record contains is a metadata URI — often an IPFS content identifier, or CID — that points to a JSON file describing the asset. That JSON file in turn points to the actual image hosted on IPFS or another storage service.
The blockchain guarantees who owns the token. It doesn't guarantee the image still exists. That's a genuinely important distinction. Several NFT collections have lost access to their imagery because the images were hosted on centralized servers that went offline. The token persists on-chain; the asset it referenced is gone.
Arweave addresses this more directly than IPFS by building permanent storage incentives into the protocol — but even that's a separate system sitting alongside the blockchain, not embedded in it.
There are three binding constraints worth understanding separately.
Block size limits come first. Every blockchain imposes caps on how much data fits in a single block. Ethereum targets around 1.5MB of data per block (with dynamic adjustment around a target). Bitcoin targets 1MB. These caps exist to ensure that nodes running on reasonable hardware can keep up with block propagation across the network. Larger blocks slow propagation and raise the minimum cost of running a full node — both of which tend toward centralization.
Node replication is the second constraint. Because all full nodes store all data, storage costs compound across the entire network. One node storing a 1GB file is trivial. Ten thousand nodes each storing that same 1GB is ten petabytes of aggregate storage committed permanently to a single file. The economics don't scale.
Fee markets are the third. Gas fees on Ethereum represent competition for block space. Large data submissions compete with smart contract interactions for that space, and developers who need storage don't want to pay fees priced for financial settlement. The market naturally routes data off-chain.
EIP-4844 — proto-danksharding — activated in March 2024 and introduced "blobs": temporary data packets that rollups use for their transaction data. Blobs are cheaper than permanent calldata because they're pruned after approximately two weeks. They're a meaningful cost reduction for Layer 2 networks posting transaction batches to Ethereum.
But blobs don't solve the large-file problem. They're temporary by design, still size-limited, and purpose-built for rollup batch data, not general storage.
The more relevant development is the maturing of dedicated decentralized storage networks that sit alongside blockchains: Arweave for permanent storage with one-time payment, IPFS for content-addressed storage (with the caveat that someone needs to pin your files), and Filecoin for incentivized storage via an active marketplace. These systems are increasingly the correct infrastructure layer for any application needing to store more than a few hundred bytes.
The hybrid pattern — content hash on-chain, file off-chain — is well-established and increasingly standardized. It isn't a workaround or a compromise. It's the right architecture.
If Layer 1 block size limits remain stable or shrink — which is the likely direction, given ongoing concern about state growth and the minimum cost of running a node — this constraint strengthens. Ethereum's roadmap (including state expiry and statelessness research) explicitly frames large on-chain data as a problem to manage, not a feature to expand.
Continued adoption of hybrid storage patterns, combined with growing decentralized storage network usage, would confirm the direction.
A blockchain that redesigned itself to treat storage as a first-class primitive — with different node tiers, storage-specific sharding, and pricing models calibrated to real storage economics — could behave differently. Arweave is the clearest example: it uses a blockchain-like structure but optimizes entirely for permanent cheap storage rather than transaction processing. It's a different design from the ground up.
At the margin, a network willing to tolerate significantly higher hardware requirements for node operators could allow larger blocks and cheaper on-chain data. That's a tradeoff against decentralization, and the major networks have generally not been willing to make it.
This constraint isn't going away for mainstream chains in any near-term window. If you're building something that needs to store documents, images, or larger data, the correct pattern today is: use decentralized storage and anchor the content hash on-chain. That's not a stopgap — it's the established engineering practice.
The question of which decentralized storage network to use is a separate evaluation with real tradeoffs around cost, permanence, and censorship resistance. But the architectural separation between storage and settlement is stable.
This post explains why blockchains aren't suited for large file storage and how the hybrid on-chain reference pattern works. It doesn't evaluate specific decentralized storage networks against each other, address every chain's data tolerance, or assess whether on-chain data storage is appropriate for any particular use case. Those require specific context.
The mechanism described — replicated state, fee markets, block size limits — applies to Bitcoin, Ethereum, and most networks that prioritize decentralization. Some specialized chains have made different tradeoffs.




