Hello community!
I’m Alex from StreamingFast and you are reading my very first post on these forums.
Following the recent announcement made by the Foundation, our team has been hard at work towards improving indexing performance on The Graph Network.
Today I want to present one of the ideas that helped us achieve great indexing performance in the past. It is called the Firehose.
Here’s the intro:
Abstract
Firehose: a files-based and streaming-first approach to processing blockchain data.
This proposition is a building block to an even larger vision of performance optimizations of the indexing stack, leading the path to parallelization.
Goals & Motivation
The goals of this document are:
- to contribute knowledge and insights gathered by StreamingFast in the last 3+ years to The Graph’s ecosystem
- to provide the necessary context to understand StreamingFast’s current contributions to the
graph-node
implementationThe goal of the Firehose is to provide a way to index blockchain data which:
- is capable of handling high throughput chains (network bound)
- increases linear indexing performances
- increases backfilling performance & maximize data agility by enabling parallel processing
- reduces risks of non-deterministic output
- improves testability and developer experience when iterating on subgraphs
- simplifies the operations of an indexer by relying on flat data files instead of live processes like an archive node
- reduces the need for RPC calls to nodes
…
It is a rather deep technical document, but I hope we can have some of the best brains meditate on it, provide suggestions and comments. I’d ask you to concentrate all feedback and comments in this thread.
I believe that, by introducing new ideas and having them collide with those of this community, we together will be able to create the greatest open data platform of all time.