GIP-0053 - Enabling Substreams-based Subgraphs (GGP-0025)

Summary

As Substreams becomes a first-class citizen with The Graph Network, the StreamingFast team believes that the Substreams software has been battle tested enough to be deemed fully production ready within The Graph Network. To ensure that Indexers will index substreams-backed subgraphs, this GGP proposes to add Oracle support for dataSource.kind == "substreams” to the Feature Matrix shared below.

This would mark an important moment where the performance promises we have made in the last 2 years come to fruition. With a very important indexing-time performance boost, as well as an important injection-time performance boost. We are talking 100x and more.

Analysis

With the launch of Uniswap v3, lots of efforts have been put in stabilizing the engine. In doing so, multiple indeterminism issues were resolved. StreamingFast has done extensive testing for determinism, have validated substreams-graph-load (the high-speed injector) and graph-node loading PoI parity, as well as the gentle hand-off to graph-node with continued parity.

StreamingFast’s testing also showed convergence of POIs during live segments of the chain, even if hit by potential reorgs (this uses the standard support in graph-node so shouldn’t introduce any new risks).

The release Release v1.4.4 · streamingfast/firehose-ethereum · GitHub is required for deterministic execution.

Potential Risks

Discovery of new indeterminisms are still possible, but mitigated by greater facility to cross-check at different stages of the pipeline. For example, the tool sfeth tools compareblocks can easily compare the source of Substreams between indexers if we need to find discrepancies. Substreams execution can produce flat files that can again be compared with great ease. And POI can be compared like we always do.

Updated Feature Matrix

Subgraph Feature Aliases Implemented Experimental Query Arbitration Indexing Arbitration Indexing Rewards
Core Features
Full-text Search Yes No No Yes Yes
Non-Fatal Errors Yes Yes Yes Yes Yes
Grafting Yes Yes Yes Yes Yes
Data Source Types
eip155:* * Yes No No No No
eip155:1 mainnet Yes No Yes Yes Yes
eip155:100 gnosis Yes Yes Yes Yes Yes
near:* * Yes Yes No No No
cosmos:* * Yes Yes No No No
arweave:* * Yes Yes No No No
eip155:42161 arbitrum-one Yes Yes Yes Yes Yes
eip155:42220 celo Yes Yes Yes Yes Yes
eip155:43114 avalanche Yes Yes Yes Yes Yes
eip155:250 fantom Yes Yes Yes Yes Yes
eip155:137 polygon Yes Yes Yes Yes Yes
Data Source Features
ipfs.cat in mappings Yes Yes No No No
ENS Yes Yes No No No
File data sources: IPFS Yes Yes No Yes Yes
Substreams data sources Yes Yes Yes Yes Yes

Next Steps

Have the Smart Contracts team modify the Oracle configuration to lift the denial on Substreams-based Subgraphs.

Once this proposal is accepted:

Anyone will be able to deploy Substreams-based Subgraphs to the network, indexers will be able to pick them up and index them with the speed of Substreams, and they can earn rewards doing so.

11 Likes

Hi @abourget - linking back to the original GIP for this feature, which is pretty representative of the implementation: Substreams into Subgraphs: a simple integration

I think there are still some documentation gaps which it would make sense to fill before rolling out full support, both for operators & developers - in terms of substreams overall (given the recent pace of development), but also specifically for the Graph Node integration. We should be able to make pretty rapid progress on this aspect.

We also previously discussed some more validation and testing with the wider indexer community, as well as determinism testing beyond the Uniswap subgraph.

There have been some recent fixes relating to substreams determinism, are there any outstanding investigations? Should we wait for those fixes to be part of a Substreams release?

Do we want to specify minimum versions for Substreams providers (to encourage upgrading), similar to some other components in the stack, or would you propose that this is an “at arms length” indexer responsibility (more similar to an indexer’s choice of Ethereum client)?

A detail point, the proposed feature matrix mentions mainnet under Substreams data sources, is the intention to enable this on a per-network basis?

1 Like

Congratulations on this significant step. The maturation of Firehose and Substreams, and their impending integration as key components within The Graph Network, is indeed an exciting development. We recognize the work that has been undertaken over the past two years, and we look forward to seeing the fruits of your labors in the form of significant performance enhancements.

While the prospect of substantial indexing-time performance improvements is compelling, it’s crucial that the necessary preparatory steps are taken to ensure the successful and sustainable implementation of these technologies.

To this end, we believe there are a few key requirements:

  1. It is important that an official graph-node is designated as supportive of this proposal. We are currently awaiting the official release of 0.31.0, and we understand that an even more recent version might be necessary.

  2. The release of the substreams-graph-load (the high-speed injector) is essential. It’s crucial that detailed documentation accompanies this release. We also suggest considering indexer training sessions to facilitate a comprehensive understanding and efficient utilization of this tool.

  3. As Adam highlighted, improving the documentation is a priority. Details pertaining not just to the integration of the graph-node but also to the overall architecture of the Firehose and Substream infrastructure need to be comprehensively covered, especially around high capacity support. It is key that we fully understand how Substreams will facilitate the proposed 100x performance enhancement.

We want to stress that our suggestions are not solely aimed at guaranteeing a smooth rollout, but also to ensure that all indexers, irrespective of their affiliations, have equal access to the necessary tooling and information. This is a fundamental prerequisite for cultivating a robust, dynamic, and truly decentralized network.

3 Likes

A few things:

  • The PR Feature/wazero by sduchesneau · Pull Request #229 · streamingfast/substreams · GitHub solved all outstanding determinism issues known to date, and that was the conclusion of our exhaustive testing. If more testing is needed, it would need to come from others. I’m unsure where MIPs left off the testing here. Perhaps Pedro can drop the instructions we gave everyone to do that testing.

  • Indeed, we will want to have a minimal Substreams version released where all of the fixes are in. We’ll do that release soon. I’m unsure if we’d need to specify some versions in some files somewhere though.

  • I took out mainnet in the matrix. I understand this is an orthogonal feature. I’m not sure why it was there.

  • Work on Documentation is underway. I’m not sure about the expected line or threshold before we put that up for signature though.

  • substreams-graph-load is cool but not necessary. The graph-node is autonomous to do a fully linear ingestion, with less devops overhead. It is also documented here: GitHub - streamingfast/substreams-graph-load: Substreams sink to write CSV files compatible with graph-node postgresql database and we’ve tested multiple runs of that doc. This GGP is not about increasing the speed of ingestion. We did graph-load for other reasons, and we’re happy it’s on-par in terms of POI, but its comprehension (or lack thereof) should not hold back the availability of this technology on the network in my opinion.

  • Everything is out in public for people to run. Many have already. I don’t think we should hold back the release of this GGP because training sessions have not happened, or a threshold of people have not yet tried it. The goal of enabling it is precisely to encourage people to jump on it. It’s a chicken and egg thing. This GGP is the incentive to jump onboard.

1 Like

Thanks for putting this together, Alex! We’re close now. :slight_smile:

If more testing is needed, it must come from others. I’m unsure where MIPs left off the testing here. Perhaps Pedro can drop the instructions we gave everyone to do that testing.

I think we should have better determinism assurances for the Council to make an informed decision on support for indexing rewards. With MIPs, you (StreamingFast) did a great job listing out the requirements and instructions for Indexers to run graph-load and extract POIs so we could then check for inconsistencies. It didn’t become an official MIPs mission as it was a bit late in the program, and participants were already too busy finishing Fantom and Polygon mainnet missions.

Now that others have developed more substreams-based subgraphs (like Messari), one quick win would be to deploy 2-3 of these to Goerli (The Graph’s testnet) and have some Indexers syncing them. This would allow us to collect the required POIs to analyze with tools like the upcoming Graphix or indexer-agent (cc @Ford). While Edge & Node will be testing this further with more subgraphs on the hosted service, there’s a value in having it tested by other parties too with different setups.

I’ll update this thread soon once we have 2-3 substreams-based subgraphs on Goerli. I’ll also reach out to the Indexer community during tomorrow’s Indexer Office Hours.

3 Likes

Update on the substreams-based subgraphs:

Some Indexers are already syncing these two. :raised_hands:
Thanks for your help here!

2 Likes

And this one is sync’d on Arbitrum:

There’s 35 indexers here:

Can someone check that? I don’t have handy commands or processes to cross-check that.

The first one you pointed to, can’t we check POIs right away ? It’s sync’d for hundreds of thousands of blocks already. Who’s the POI expert around?

Alexandre

The plan is to start using Graphix next week to get a report on POI convergence and understand where divergence happened, if any. E&N is still developing the tool, but the latest estimate is that network mode will be ready by next week.

I’ll keep reporting here what the findings are.

Also, @Ford mentioned Indexers would need to run an upgraded Indexer Service (Graphix hits Indexers’ API directly), so a new release will be available soon.

1 Like
  • We compared POIs on the Uniswap substreams release v0.2.6 (Release Release v0.2.6 · streamingfast/substreams-uniswap-v3 · GitHub) (between blocks 12.3M and 17M)
    • Streamingfast’s database was filled in parallel from our own substreams deployment, using our graph-load tool on June 28th.
    • Ellipfra’s database was filled linearly from their own substreams deployment (they produced their own blocks and started with clean cache) using graph-node directly.

The resulting POIs were identical (compared directly at specific blocks in postgresql database)

We’ve identified the issues that lead to the previous POI mismatch to be an operator error, so our uniswapv3 substreams was indexed from a “poisoned” cache (produced before substreams release v1.1.5)

Unfortunately, the Status endpoint on Indexer’s API currently doesn’t work well for comparing substreams-based subgraphs because the “blocks” cache is not filled completely and the Status endpoint returns a “null” value if it does not know the block hash. This will need to be fixed in a graph-node release for Graphix to work well in these cases.