Semiotic Labs October 2023 Update

:woman_astronaut: Summary

Last month saw significant advancements in the TAP Scalar and Verifiability/data correctness domains. Within the TAP Scalar project, with the support of Jannis from E&N, we refactored standard components from the main indexer-service-rs crate for future development. Although the indexer-tap-agent creation began, design issues prevented a full prototype delivery. Several PRs collaborated with GraphOps and E&N to further integrate TAP functions. Our team made strides with three main projects on the Verifiability front: verifying indexed event correctness from Firehose endpoints, developing an optimistic verification protocol for commitments, and studying cutting-edge verifiable SQL systems. We also designed a protocol for ensuring verifiable data accuracy using optimistically verifiable commitments. Furthermore, our Verifiable Queries & Product Tree project demonstrated the feasibility of verifying specific query types for event logs using validity proofs. To improve our developing SQL data service, we implemented strategies to achieve data consistency, handle case sensitivity in wallet addresses, and optimize query execution.

:tada: Looking back (what was delivered)

TAP Scalar/payments

Jannis from E&N joined the effort and started work on refactoring standard components out of the main indexer-service-rs crate into a common crate and cleaning them up. The goal is to facilitate the development of future paid services in The Graph.

On the TAP side of the effort, development started on an indexer-tap-agent, tasked with executing all the receipt checks and requesting RAVs from the consumers. Some design issues slowed our action so we couldn’t deliver a fully-featured prototype this month.

Indexer Service Rust port + TAP integration PRs (collaboration with GraphOps and E&N):

  • TAP receipts handling in the indexer-service paid query flow [PR #47]
  • Refactor code into indexer-common crate (Jannis) [PR #53]
  • Various refactors and cleanups (Hope/Jannis) (PRs #46 #52 #56 #60)

Verifiability/data correctness

We made progress on three tasks this month:

  • Checkpointed the development and benchmarking of the proof-of-concept for verifying the correctness of indexed events (stored in the receipts trie) from blocks streamed from a Firehose endpoint.
  • Designed a protocol for optimistically verifying commitments, which directly applies to upgrading proof-of-indexing (POIs).
  • Surveyed the state of the art for verifiable SQL systems.

Optimistically-verifiable commitments

We designed a protocol which can be used to verify that a commitment has been constructed as expected. The protocol is optimistic in that it relies on a dispute mechanism to challenge the correctness of commitments. The protocol has a 1-of-N trust assumption, meaning that as long as one participant is honest, the protocol is secure. The protocol uses a refereed game to manage disputes. We implemented the protocol’s proof-of-concept in Rust.

This is particularly relevant to proof-of-indexing (POI), which is essentially a commitment to data that an Indexer has indexed. Our protocol will be intended to replace the human-in-the-loop arbitration process (currently used to manage disputes in The Graph) with a refereed game operated via a smart contract.

The commitments constructed with this protocol may also be used as the foundation for verifiable database queries, i.e., vSQL. Currently, the protocol only verifies data sourced from event logs with no Substream transformations.

Verifiable queries & product tree

Over the past few months, we have developed a protocol that can be used to verify the correctness of certain types of queries for event logs using validity proofs. The proving system combines a SNARK and an inclusion proof similar to Merkle inclusion proofs. Using validity proofs implies that a proof of correctness accompanies each query response. An example of the queries that can be answered is “How many times has this contract emitted a particular event in this block?”

At a high level, the protocol works by specifying contracts of interest and associated events of interest and encoding them into a data structure (product tree), which enables efficiently verifiable queries about the events. To be verifiable, we need to prove that the events of interest were actually emitted in a block and that the corresponding encoding was done correctly. To do this, we use a SNARK to prove that the Receipt Tree for a block was built and encoded as expected. For more details, see here.

We evaluated several off-the-shelf SNARK proving systems (Noir, Cairo, RISC Zero) for proving time and found the most performant to be RISC Zero using their Bonsai proving network. This process’s proof generation takes about 250 seconds ± 20 seconds. This varies depending on the load of the RISC Zero servers. Note that this proof has to be generated once per block, can be shared amongst all Indexers interested in the same events, and that the proofs for each block are independent of one another, which means it should be possible to pipeline proof generation for a fixed latency bottlenecked by the proof generation time (e.g., ~250s using RISC Zero/Bonsai). The code we developed is located here.

What’s next for this work? This system provides a novel method for proving certain types of queries for Subgraphs that index event logs, and it would be interesting to see how these queries can be incorporated into The Graph, for example, as another data service. Additionally, we would like to investigate how this proving system might enable validity proofs for other components of The Graph, e.g., POIs.

SQL data service

We have identified several critical challenges in efficiently collecting and analyzing data from blockchain transactions. A central objective is swiftly providing aggregated data, such as identifying the most-traded ERC-20 token in a given week. We have devised a set of rules and strategies to address these challenges and enhance our query performance.

Rules & strategies:

  1. Minimizing Joins in Queries: We prioritize minimizing joins between large tables, notably transactions, and swaps, to optimize query performance. This involves materializing essential data within the relevant tables to reduce the need for frequent joins.
  2. Ensuring Data Completeness and Consistency: Asynchronous data collection processes may introduce empty fields. We propose adding necessary fields to Substreams to mitigate this issue, ensuring comprehensive data representation.
  3. Handling Case Sensitivity in Wallet Addresses: Given the case-sensitive nature of ClickHouse, we recognize the need to standardize wallet addresses, often expressed in mixed cases due to checksum formatting. To address this, we implement a strategy of converting all strings to lowercase during SQL parsing and data digestion.
  4. Optimizing Query Execution with Projections: We are identifying frequently used and performance-intensive queries. By creating projections based on these queries, we leverage ClickHouse’s query rewrite feature to seamlessly replace these projections with the original base table, significantly reducing query execution time.
  5. Performance Comparison and Future Data Integration: Comparative analysis indicates a substantial performance improvement of nearly 10x compared to Dune.

:rocket: Looking ahead (upcoming priorities)

TAP Scalar/payments

Indexer Service Rust port + TAP integration (collaboration with GraphOps and E&N):

  • Complete indexer-tap-agent
  • Integrate TAP RAV and fee redemption in indexer-agent

Verifiability/data correctness

We will meet with other core devs and plan the next steps for further integration in the protocol. Specifically, we have four topics of interest:

  • The protocol for verifying the correctness of Firehose’s flat file output enables Indexers to trustlessly sync a Firehose node by downloading flat files from other Firehose providers rather than sync the node from Genesis.
  • The protocol for verifying the correctness of event log queries using product trees.
  • The optimistically verifiable commitment protocol with applications to POIs.
  • And the state of the art for existing verifiable SQL systems.

Other notes

  • Hired an LLMOps Engineer