Proposal: Rethinking Indexing in the Horizon Era

:police_car_light: The Current Situation

Today, The Graph’s indexing economy faces two major issues:

  1. Redundancy without coordination

    • 100+ Indexers often index the same subgraph, duplicating heavy compute work.

    • Infra costs keep rising while indexing rewards shrink.

  2. Inconsistency and disputes

    • Different environments (RPC nodes, configs, infra) can lead to POI discrepancies — even when Indexers operate honestly.

    • There’s no universal “right answer,” only majority agreement.

  3. Unsustainable economics

    • Indexing rewards are still inflation-based, not tied to real usage.

    • Queries are supposed to dominate the economy long-term, but indexing costs remain high while revenues stay thin.

Competition is great when it drives performance. But at the indexing layer, deterministic tasks repeated by everyone are becoming wasteful rather than valuable.

:bullseye: Strategic Question

Do we need 100+ Indexers recomputing the same subgraphs and slashing each other over POI disputes?

Or do we need a consensus-driven indexing layer feeding into scalable storage and query layers, where competition thrives where it matters: serving users?

:puzzle_piece: A Three-Layer Model for Horizon

Horizon is already moving The Graph toward a modular, service-based architecture with stronger service-level commitments. Building on that, we can envision a layered approach:

1. Indexing Layer (Consensus Compute)

  • Subset of Indexers perform the actual indexing.

  • Outputs are validated via consensus / proofs, then sealed.

  • Rewards tied to verified compute work (like PoW or Proof-of-Computation).

:backhand_index_pointing_right: Removes redundant compute, ensures correctness, and fairly compensates for the heaviest task.

2. Storage Layer (Durable Availability)

  • Sealed subgraph data is replicated and stored by specialized providers.

  • Rewards tied to availability proofs, similar to Filecoin.

  • Replication is cheap, predictable, and widely distributed.

:backhand_index_pointing_right: Separates compute from availability, lowers costs, strengthens resilience.

3. Query Layer (Utility-Driven Service)

  • Operators serve queries against verified, sealed datasets.

  • Anyone can participate, competing on latency, reliability, coverage, and cost.

  • Queries become the dominant economic driver, as Horizon envisions.

  • Decentralized gateways become easier to implement, since queries reference sealed data.

:backhand_index_pointing_right: Turns the protocol into a scalable, predictable query market — aligned with real user demand.

:white_check_mark: Benefits

  • Efficiency: End redundancy at the indexing layer.

  • Correctness: Consensus-backed outputs remove ambiguity in POIs.

  • Economic sustainability: Indexing rewarded fairly, but the economy shifts toward queries.

  • Scalability: Global demand met by replicated storage + distributed query operators.

  • Horizon alignment: Leverages Horizon’s modular architecture and SLAs, but extends it toward specialization.

:balance_scale: Tradeoffs

  • More complex architecture (3 roles vs. 1).

  • Governance needed for consensus at the indexing layer.

  • Risk of centralization if indexing is restricted to too few participants.

  • Transition challenges — migrating from today’s model to layered roles must be gradual.

:telescope: How This Fits Horizon

Horizon is already introducing:

  • Modular data services (not just subgraphs).

  • Service-level agreements and slashing.

  • Permissionless roles with differentiated responsibilities.

The layered model extends this trajectory by:

  • Treating indexing, storage, and querying as distinct service classes.

  • Shifting competition away from redundant compute toward query performance.

  • Creating a pathway for query-first economics, which Horizon has identified as the long-term vision.

:red_question_mark: Open Questions for the Community

  1. How should indexing rights be assigned — election, bonding, random sampling, or open competition with consensus proofs?

  2. Should indexing become a rotating duty (like block producers) or remain open to all?

  3. What consensus mechanism is best for verifying indexed data? Challenge-response, deterministic replay, or even ZK proofs?

  4. Should storage be incentivized as a standalone role, or bundled with indexing?

  5. How do we transition from today’s all-indexers-do-everything model to layered specialization without disruption?

  6. What balance of decentralization vs. efficiency is acceptable at the indexing layer?

:compass: Conclusion

The Horizon upgrade is an opportunity to rethink fundamentals. The current flow — every Indexer indexing every subgraph independently — has proven expensive, inconsistent, and economically fragile.

A layered approach — indexing (consensus), storage (replication), queries (service) — could provide scalability, predictability, and long-term sustainability, while still staying true to The Graph’s ethos of open participation.

The question is not if we need to evolve the indexing flow — it’s how soon and in what shape. Horizon gives us the tools to experiment with this evolution.

:backhand_index_pointing_right: This piece is not a final proposal. It’s an invitation to the Indexer community and The Graph Foundation:

  • Should we continue with redundant indexing as the baseline?

  • Or should we experiment with layered specialization that aligns with Horizon’s modular vision and pushes us toward a query-first economy?

6 Likes

…. thanks for writing this, @stakemachine :eyes:

2 Likes

Great write-up @stakemachine!

All of your points are spot on. The one that sticks out though is I doubt we’ll ever be able to have some kinda subgraph-indexing-consensus layer since many of these subgraphs are way too chonk to keep at chain head for numerous good-faith operators who are beholden to inferior clients and/or blockchains themselves.

One of the biggest issues in the protocol is Curation. Curation is a massive blocker to funneling compute power and servicing to the correct spot, and as a result it so happens to be the biggest rugging operation known to the protocol: first-in wins, everyone after loses. There needs to be a way for publishers to launch valuable subgraphs for a small price, gather curators who see said value, and not have to worry about the massive GRT rug that could happen as a result of curation. When this is solved I think we’ll see much better organic action within the protocol that could solve many of our complaints.

Hey @86b I just wanted to point out that on curation. The new mechanism introduced by The Graph when migrating to Arbitrum replaces the old bonding curation curve (which had early-mover advantages) with what’s often referred to as a flat bonding curve. GIP-0039: Curation v1.x

But how does that work in reality? Can you give some simple equations to back this so we can take a look at the mechanics?

I attached link to GIP 0039 above that has the math (Not gonna lie math over my head). In practical terms.

Under GIP-0039, if someone signals 100 GRT when a subgraph launches, they don’t automatically profit from others signaling later, their earnings now come mainly from query fees generated by real usage. When early curators un-signal (burn), it no longer affects others who stay in. Over time, query fees still add value to everyone’s shares, so if a second curator signals 100 GRT months later, their shares might differ slightly because query fees have already been added to the curve.

Addendum: Content-Addressed Storage as the Backbone of a Verifiable Storage Layer

To support the proposed separation between indexing, storage, and query execution in a more scalable and verifiable way, we can take technical inspiration from ipfs-ethdb — which reimplements Ethereum’s state database using IPFS-backed, content-addressed storage.

This model can be adapted to The Graph’s architecture as follows:

:counterclockwise_arrows_button: Indexing → Sealing → Replication → Querying

Step 1: Indexers produce sealed state

  • A set of designated Indexers (elected, bonded, or probabilistically selected) index specific subgraphs for a defined epoch or range.

  • Instead of emitting mutable local DB snapshots, they output:

    • A deterministic serialized representation of the subgraph state (e.g., a sorted JSON/Trie/Merkle form),

    • A content identifier (CID) for that dataset (e.g., IPFS hash),

    • And optionally a state manifest (metadata about block height, subgraph version, schema hash, etc).

  • Multiple Indexers working on the same subgraph must agree on the same CID, forming a quorum (N of M) for verification.

Step 2: Seal and publish

  • Once consensus is reached on the indexed state, the CID is considered sealed and final for that subgraph & epoch range.

  • This sealed state is pushed to a distributed storage layer (e.g., IPFS, IPFS+Filecoin, Arweave).

  • A registry of verified sealed states can be published on-chain or via an off-chain oracle for others to reference.

Step 3: Storage Providers replicate

  • A separate class of operators — Storage Providers — replicate and retain the sealed state data.

  • These providers can be rewarded via:

    • Proofs of retrievability / availability (like Filecoin’s Proof-of-Spacetime),

    • Or a delegated protocol mechanism (e.g., “store this CID for X epochs and prove it when challenged”).

  • These nodes do not need to index — they only serve verified data blocks.

Step 4: Query Nodes serve user requests

  • Query Nodes (which can be run by Indexers, gateways, or external service providers) retrieve the sealed subgraph state from content-addressed storage.

  • Since the data is sealed and agreed-upon, they can confidently serve GraphQL queries without reindexing or local computation.

  • If the state CID doesn’t match a verified manifest, the query provider can reject it or fall back.

Who Does What?

Role Responsibility What they can use
Indexers Index assigned subgraphs deterministically, produce CID + manifest, publish proof Use raw chain data, generate sealed subgraph state
Storage Providers Replicate and persist sealed states (by CID), prove availability on request Use sealed subgraph snapshots (no need to reindex)
Query nodes Retrieve verified sealed state, execute read-only GraphQL queries for clients Pull and serve verified data, no indexing needed

Benefits

  • Single Source of Truth: Sealed subgraph states, addressed by CID, reduce ambiguity and eliminate POI disputes.

  • Lightweight Query Serving: Any node can serve queries without running heavy indexing infrastructure.

  • Modular Participation: Operators can specialize — index, store, or serve — depending on capability.

  • Auditability: Any user or verifier can fetch a state CID and check its contents, schema version, and metadata.

  • Redundancy & Reliability: Storage and serving responsibilities can be decentralized and regionally distributed.


Open Technical Considerations

  • Serialization format: What is the deterministic, schema-bound output format for sealing subgraph states?

  • Consensus mechanism: How is quorum determined for sealing a CID? What if validators disagree?

  • Storage incentives: How do we enforce long-term availability and prevent garbage collection?

  • State upgradeability: How do sealed states evolve with subgraph upgrades or schema changes?

  • Partial queries / pagination: Can content-addressed chunks be queried efficiently at scale?


Summary

By integrating content-addressed storage into The Graph’s storage layer, we can reduce redundancy, increase verifiability, and make query-serving cheaper and more accessible. This model transforms subgraph indexing from a fragmented, duplicated effort into a coordinated, verifiable, and reusable process — enabling scalable growth for the network.

This aligns naturally with Horizon’s modular vision and gives us a clean path toward a query-first protocol economy.

1 Like

Hey @stakemachine thanks for posting this. At Edge & Node we’ve been working on the next generation data services on The Graph (on top of Horizon, of course), and we want to incorporate the lessons learned from the subgraphs protocol when doing so. I think what we’re planning will address the concerns that you mention, but we’ll likely start by proposing a simpler architecture (without so many layers). We’ll be sharing more in the coming weeks, but I wanted to leave this note so that you know these concerns are noted and we are working on fixing them.

3 Likes

I’m glad to see this brought up. The Graph is such a cool technology, and only growing. But lack of income capture is clearly reflected in inflation sales by indexers to support their going concern. It’s not going to last forever, as token price drifts lower and lower. I’m not smart enough to have the answer, but protocol income has to catch up and inflation needs to be voted to at least stasis. To modest deflation, even better, but that’s probably a target for a few years down the line.