How Blockchain Indexers Work

Blockchains are optimized for tamper-resistance, not retrieval. Indexers solve this by continuously reading on-chain data, decoding it, and writing it into queryable databases — making real-time dApps possible.
Lewis Jackson
CEO and Founder

Blockchains are terrible databases. That's not an insult — it's a design tradeoff. A blockchain is optimized for tamper-resistance, not retrieval. The data is organized by block and transaction, not by the questions anyone actually wants to ask.

If you want to know the token balance of every address that interacted with a specific smart contract over the last 90 days, you'd technically need to replay every block from deployment and track state changes manually. That's not a hypothetical — it's exactly the kind of query that makes raw blockchain data impractical for applications.

Blockchain indexers exist to solve this mismatch. They sit between the raw chain and the applications that need to read it, processing data into structured, queryable formats in real time.

The Core Problem

A blockchain node stores data in a format designed for consensus and verification. Blocks contain ordered lists of transactions. Transactions contain encoded function calls and value transfers. To find anything specific — say, all NFT transfers from a given address, or the historical APY of a lending pool — you'd need to decode the raw bytecode, parse the logs, and aggregate across thousands of blocks.

This is possible but slow. For production applications — wallets, analytics dashboards, DeFi protocols — querying this way introduces latency that makes the product unusable. A DEX frontend can't wait 30 seconds for a page to load.

Indexers solve this by doing the hard work ahead of time. They continuously read new blocks, decode and parse the relevant events and state changes, and write them into a conventional database — typically PostgreSQL or a similar structure — organized in a way that supports fast reads. When an application needs data, it queries the indexer, not the blockchain directly.

How an Indexer Actually Works

The basic architecture has four components.

Chain connection. The indexer connects to a blockchain node (or a node provider) via RPC and subscribes to new blocks. As each block is finalized, the indexer reads its contents.

Event parsing. Smart contracts emit events — structured logs that record significant state changes. An ERC-20 token transfer, for example, emits a Transfer(address from, address to, uint256 value) event. The indexer decodes these logs using the contract's ABI (the interface specification that defines what events and functions look like) and extracts the meaningful fields.

State tracking. Some data can't be reconstructed from events alone — it requires calling view functions on the contract to read current state. Indexers may periodically snapshot this data and track changes over time.

Database write. Decoded, structured data gets written into a queryable database. The indexer maintains a mapping from on-chain activity to database records, which applications then query via a standard API — usually GraphQL or REST.

When a reorg occurs (a chain reorganization where blocks are replaced), a well-built indexer needs to detect it, roll back any affected records, and reprocess the correct blocks. This is one of the harder engineering problems in indexing, and not all implementations handle it cleanly.

Centralized vs. Decentralized Indexing

There are two broad approaches, and they carry meaningfully different trust assumptions.

Centralized indexers (Alchemy, Moralis, Etherscan's underlying data layer) run as managed services. A company operates the infrastructure, indexes the chains, and exposes the data via API. The advantage is reliability and speed. The tradeoff is trust: you're relying on the provider's infrastructure to return accurate, uncensored data. Most consumer applications use centralized indexers because they're easier to integrate and predictably performant.

Decentralized indexers — the paradigm The Graph Protocol is built around — distribute the indexing work across a network of independent operators. In The Graph's model, developers publish subgraphs (indexing schemas that define which contracts and events to track), and a decentralized network of indexers (node operators) process queries and get paid in GRT tokens for their work. Curators signal which subgraphs are worth indexing by staking GRT on them. Delegators stake GRT toward indexers to share in rewards without running infrastructure.

The design goal is to make indexed blockchain data verifiable and censorship-resistant — any indexer returning falsified data can be challenged and slashed. In practice, most high-stakes production applications still use centralized providers for latency reasons, while The Graph's decentralized network is more commonly used for DeFi applications and community-built tooling where trust assumptions matter more.

Where the Constraints Live

The hard constraints here are mostly technical. Every indexer has some latency — data is never truly real-time, only near-real-time. The lag is typically seconds for finalized chains, longer for chains with probabilistic finality. Applications that need sub-second freshness can't rely on indexed data for the lowest-latency queries.

Chain reorganizations remain a structural challenge. Shallow reorgs (one or two blocks) happen regularly on most chains. Deeper reorgs are rare but possible. An indexer that doesn't handle reorgs gracefully will serve stale or incorrect data without signaling that it's doing so — which is worse than an outage.

Trust is also a constraint. For centralized providers, the guarantee that data is accurate is purely contractual and reputational. There's no cryptographic proof that a centralized indexer hasn't modified what it's serving. For most applications this is an acceptable tradeoff; for others, it isn't.

What's Changing

The Graph has been expanding its multi-chain support — the decentralized network now covers Ethereum, Arbitrum, Optimism, Polygon, and a growing list of chains. The question of whether decentralized indexing can compete with centralized providers on performance has been a sustained engineering challenge; improvements in query routing and indexer hardware have narrowed the gap, but it hasn't closed.

A newer development worth watching: streaming indexers — tools like Ponder and Envio — designed for real-time event processing rather than batch indexing. These are targeted at applications where latency matters more than historical depth. The ecosystem is fragmenting into specialized solutions rather than consolidating around a single approach.

There's also quiet movement toward on-chain data availability improvements (EIP-4844 and future Ethereum roadmap items) that may eventually reduce the data that needs to be indexed externally. That's a longer horizon.

Confirmation Signals

Continued growth in subgraph deployments on The Graph's decentralized network. Narrowing performance gap between decentralized and centralized indexers for standard query types. Major DeFi applications migrating from centralized providers to decentralized alternatives for trust-sensitive data.

Invalidation

Persistent latency disadvantage preventing decentralized indexing from capturing production workloads beyond community tooling. A significant data falsification event at a major centralized provider, which would accelerate decentralized alternatives but also damage broader trust in indexed data. Ethereum state changes that make current indexing approaches obsolete.

Timing

Now: Centralized indexers (Alchemy, Moralis, The Graph's hosted service) are the dominant infrastructure for most production applications. The tradeoffs are understood and the tools are mature.

Next: Decentralized indexing competition intensifies. Streaming indexers gain adoption for real-time use cases. Multi-chain indexing complexity grows as the number of active chains increases.

Later: If Ethereum's roadmap delivers meaningful on-chain data availability improvements, the indexing landscape could shift structurally. That's speculative at current progress rates.

Boundary Statement

This covers the indexing mechanism and the infrastructure layer. It doesn't address how to build a subgraph, how to choose between providers for a specific use case, or the economics of The Graph's GRT token. Those are separate questions.

The mechanism works as described. Whether a specific indexer provider is appropriate for a specific application depends on the application's trust requirements, latency needs, and chain coverage — factors outside the scope of this explanation.

Related Posts

See All
Crypto Research
New XRP-Focused Research Defining the “Velocity Threshold” for Global Settlement and Liquidity
A lot of people looking at my recent research have asked the same question: “Surely Ripple already understands all of this. So what does that mean for XRP?” That question is completely valid — and it turns out it’s the right question to ask. This research breaks down why XRP is unlikely to be the internal settlement asset of CBDC shared ledgers or unified bank platforms, and why that doesn’t mean XRP is irrelevant. Instead, it explains where XRP realistically fits in the system banks are actually building: at the seams, where different rulebooks, platforms, and networks still need to connect. Using liquidity math, system design, and real-world settlement mechanics, this piece explains: why most value settles inside venues, not through bridges why XRP’s role is narrower but more precise than most narratives suggest how velocity (refresh interval) determines whether XRP creates scarcity or just throughput and why Ripple’s strategy makes more sense once you stop assuming XRP must be “the core of everything” This isn’t a bullish or bearish take — it’s a structural one. If you want to understand XRP beyond hype and price targets, this is the question you need to grapple with.
Read Now
Crypto Research
The Jackson Liquidity Framework - Announcement
Lewis Jackson Ventures announces the release of the Jackson Liquidity Framework — the first quantitative, regulator-aligned model for liquidity sizing in AMM-based settlement systems, CBDC corridors, and tokenised financial infrastructures. Developed using advanced stochastic simulations and grounded in Basel III and PFMI principles, the framework provides a missing methodology for determining how much liquidity prefunded AMM pools actually require under real-world flow conditions.
Read Now
Crypto Research
Banks, Stablecoins, and Tokenized Assets
In Episode 011 of The Macro, crypto analyst Lewis Jackson unpacks a pivotal week in global finance — one marked by record growth in tokenized assets, expanding stablecoin adoption across emerging markets, and major institutions deepening their blockchain commitments. This research brief summarises Jackson’s key findings, from tokenized deposits to institutional RWA chains and AI-driven compliance, and explains how these developments signal a maturing, multi-rail settlement architecture spanning Ethereum, XRPL, stablecoin networks, and new interoperability layers.Taken together, this episode marks a structural shift toward programmable finance, instant settlement, and tokenized real-world assets at global scale.
Read Now

Related Posts

See All
No items found.
Lewsletter

Weekly notes on what I’m seeing

A personal letter I send straight to your inbox —reflections on crypto, wealth, time and life.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.