Last month saw significant updates in three key areas: TAP Scalar, Verifiable Firehose, and SQL data service. Major TAP PRs addressed various refinements and transitions for enhanced flexibility and optimized software integration. TAP Contracts underwent audits, revealing some medium and low-severity issues, most of which have been resolved. The Verifiable Firehose team focused on implementing and benchmarking a proof-of-concept to improve Ethereum block verification. The SQL data service team did further testing on ClickHouse and made strides in data collection.
- Major PRs: Derivation path options, EIP-712 domain customization, transition to Alloy, and asynchronous architecture.
- Contract Audits: Completed, with some issues resolved.
This month Semiotic Labs’ main contribution revolved around TAP integration and implementing paid query flow. This month many of the main facilities were implemented, with a notable one related to the completion of the TAP smart contract audit performed by OpenZeppelin. The process was smooth due to close collaboration with E&N.
Other tasks include the deployment health server (allows the gateway to know the indexing status of served Subgraphs), query metrics (for the Indexer to monitor their quality of service), API rate limiter (to protect against DOS attacks against the indexer public APIs, such as the deployment health server), and most of the paid query flow (takes care of verifying that queries are against eligible allocations, unpacks payment information, relays to the graph-nodes, etc).
The past month also marked the start of a collaborative effort with GraphOps aiming at porting the indexer-service (the query payment layer on the indexers’ side) to Rust. The goals are to reduce query performance bottlenecks through improved performance, fix long-standing issues with memory usage, and facilitate the integration of TAP, also in Rust, into the indexer stack.
- Refinements Based on Gateway Designer’s Feedback:
- Transitioned to Alloy (from Ethers-rs) for enhanced flexibility
- Optimized for Seamless Indexer Software Integration:
- Audits completed [link to report] [link to github issues]
- PoC implemented for proving the construction of Ethereum’s receipt trie using SNARKs.
- Contacted teams for distributed proof generation system benchmarking.
This month we were focused primarily on implementing and benchmarking a PoC for verifying the correctness of events (stored in the receipts trie) for blocks streamed from a Firehose endpoint; the second milestone in the Firehose Problem Statement. The current design verifies blocks by reconstructing the receipts trie for a block and then comparing the root to a “trusted” block header that has been accepted in Ethereum’s consensus layer. The correctness of the trie construction is then proven using a SNARK. We have implemented proof-of-concepts for this using Noir and the RISC0 zkVM and have a partial prototype in progress using Cairo. On the positive side, we have been able to implement the PoC for proving trie construction using a SNARK, however the best “off the shelf” prover time for proving the construction of a receipt trie for a full block took 72 minutes (using RISC0) on a single machine (e.g. Macbook Pro). In order to be useful for The Graph, the SNARK prover time for this trie construction must be less than 12 seconds (the Ethereum block production rate). The current PoCs run serially and are unoptimized and we expect performance to improve.
We have also been in contact with teams who claim to have dedicated proving networks (Nexus, Satori) to benchmark our PoC with state-of-the-art distributed proof generation systems. Trie generation is highly parallelizable, benchmarking with these distributed proving systems will give us insight into the increased performance for parallelized proof generation.
- Shifted to ClickHouse; sunk 2B records in < 12 hrs.
- Focus on DEX data and ERC-20 token properties.
- Major PRs: Support for multiple SQL dialects, async, and batch inserts.
We started using ClickHouse for testing SQL/analytics queries. We finished sinking both Ethereum block and transaction data which is around 2B records. It took less than 12 hours to sink that data. Our current focus is on collecting and testing with DEX data. The current swap pool pairs have more than 170k different tokens and we have only identified 30k tokens from our simple pricing/oracle Substream. So we built a database-level solution to get ERC-20 token properties (name, symbol, decimals) with on-chain contract calls to include the remaining popular tokens (tx count, total USD amount). We also experienced a blocker due to investigating ClickHouse’s CLA.
- Added support to multiple SQL dialects and added support to Postgres and ClickHouse dialects. We abstracted the Postgres connector into “dialects”. Each dialect can use its own optimizations to insert, update, and delete (if supported). We moved the current Postgres implementation into its own dialect and we added ClickHouse and its optimizations as a new dialect. [#33]
- Added async insert in ClickHouse dialect [#34]
- Added batch inserts into ClickHouse dialect, matching the best speed we can get with ClickHouse [#35]
The next month will see the implementation of the cost model server (lets the Gateway fetch the indexers’ cost models), the TAP Escrow subgraph (a subgraph that monitors the state of TAP’s smart contract escrow accounts between indexers and gateways), integration of TAP into the query flow, creation of the TAP receipt aggregation agent (lets indexers ask for receipt aggregates from the gateways on a regular basis). After all these things are done, we should have a functional alpha that we’ll start integration testing, end-to-end, with the help of E&N.
Implement several features and tests for achieving a “fully functional alpha”:
- Cost model server
- TAP Escrow subgraph
- Scalar TAP receipts management integration into the paid query flow
- Create the TAP receipt aggregator agent
- (at this point a fully functional “alpha” should be complete)
- Integration (end-to-end) tests with E&N
- Further PoC benchmarking and design optimizations.
We will continue benchmarking the PoC and will start investigating optimizations to the design and how to take advantage of parallel proof generation, using “off-the-shelf” optimizations e.g. with RISC0, and also using more custom optimizations e.g. using the Nova proving system.
We will port our API backend to Rust (currently it is Python). We will gain unprecedented speed on some queries by materializing and aggregating data based on events. Unfortunately, some of the related data can not be always available/materialized at the time of a new event due to the asynchronous nature of data collection (e.g. pricing) or size (transactions). So we plan to create a lazy materialization of this data to minimize “joins” in queries which is a bottleneck.
- Sam presented “Real-Time Data Improvements in The Graph” at https://bass.sites.stanford.edu/. Link to presentation to be updated.