Heey, lots brewing here. Check it out:
Looking back
- Shipped a Google PubSub sink. GitHub - streamingfast/substreams-sink-pubsub See the announcement here: New Substreams Sink: Google Cloud PubSub | by StreamingFast | Mar, 2024 | Medium
- Shipping an updated https://substreams.dev
- Shipped a Bitcoin Firehose for BTC. Yes announcements are still pending… boy boy boy…
- Worked on cost estimation for running and consuming Substreams. Live at https://cost.streamingfast.io but not publicly documented or accessible yet.
- Rolled out Substreams optimizations that unclogged some slow-running streams in certain conditions. We’ve got lots of happy feedback on this.
- For Operator, we fixed a long-standing bug with the dns:// routing of internal components (tier1 to tier2 comms), it just didn’t work, even though it said it did. Well it does now!
- Deployed blockmeta services for all supported chains at StreamingFast. Will speed up the release of new networks to The Graph Network. See details here: blockmeta-service/server/README.md at develop · streamingfast/blockmeta-service · GitHub
- Pitched a vision of modules as a Shared Intelligence Layer, seeded at the last Messari Mainnet conf: Shared Intelligence Layer — How The Graph’s Decentralized Intelligence Layer Paves the Way for Rapid Innovation | by StreamingFast | Feb, 2024 | Medium
Looking ahead
In the next few weeks, we’re going to be diving deep into Substreams optimizations, implementing blockIndex/blockFiltering at the Substreams-level, alongside cost optimizations when reading cached files.
- This feature already exists within Firehose, and is what caused 90% reduction in costs with sparse Subgraphs when we first integrated into
graph-node
. One will be able to write a freeform Substreams module and have the whole chain indexed once, and then query those indexes to jump only to blocks with content you’re interested in. For example, we’ll have public modules that index all the ERC-20 from/to addresses, another module with contracts being called. That will be a chain agnostic way to sift through chains at extreme speeds. See this issue: Create a substreams module that filters out (reduce) the event logs based on the same language as the blockindex · Issue #415 · streamingfast/substreams · GitHub and its dependencies. - This will avoid reading full blocks when cached files exist that would avoid the need to read the full block. The current engine is clocked on reading a stream of full merged-blocks, make it difficult to avoid reading them when everything has already been processed. We’re going to flip over the execution of tier2 jobs, to only dynamically load what is necessary, and what is missing (from the output module up the parent tree). See issue: Make tier2 jobs able to run only on cached mapper and stores outputs (without blocks when not needed) · Issue #418 · streamingfast/substreams · GitHub
These two major features will turn the Substreams engine into something much more powerful. Allowing even wallets, accounting firms, tax software, to be able to use this as a high speed backend, without the need to index all the things to live databases, with the incurred costs of RAM, CPU, disks, etc… We’re getting closer to a BigQuery engine that is historical, but also real-time and fork-aware, with its beautiful cursor-based navigation.
This will be awesome.