Semiotic October 2024 Update

Summary

In our last update, we discussed updating our priorities. This last month has shown the benefits of that restructuring, as we’ve been able to fully commit to high-priority efforts. Chief among these are TAP and Verifiable Extraction (VE).

TAP has finally made it into production. 6.32% of the network’s queries are already being routed through it, and we hope to see this number increase to 50% by the end of the quarter. We’re hard at work on bug fixes and new features, and we’ve even started to plan TAP v2.

We’ve wrapped up research work on pre-merge Ethereum data for VE (previously Verifiable Firehose). This next month, we’ll move this towards being production-ready. We still have some research tasks for post-merge data that we’ll need to focus on in the next month.

Last but not least, we got our AI paper on synthetic data generation accepted to a NeurIPS workshop! This research will be the bedrock on which we build a potential future effort of creating a natural language interface for The Graph.

TAP

Over the past month, we finally released our stable version 1.0.0 for indexer-tap-agent and indexer-service-rs. With the launch, we updated our release frequency policy for the projects we maintain, and we are starting to release versions for every new feature or bug fix that we push into the code.

We plan to keep doing the same work: looking for bugs and implementing features Indexers want. We also plan to refactor the code multiple times to make it easier for newcomers to understand and quickly start contributing. We will soon start working on TAP v2. With the Horizon update, we are really excited to port our current payment system to support multiple data services. Stay tuned for more information on that!

Features and Bug fixes

  • Added feature to inject environment variables inside your config
  • Removed requirement for --config command argument if you provided all config fields via env vars
  • Updated the default log level to INFO
  • Added a single prefix INDEXER_ to be used in environment variables across indexer-tap-agent and indexer-service-rs
  • Added database configuration alternative to database_url with host, user, password, and database
  • Added metrics to indexer-service-rs
  • Updated how we track receipt fees to prevent certain errors from appearing
  • Fixed a bug where it would try to redeem RAVs when there are no funds in the escrow account in indexer-agent
  • Fixed bugs related to missing pagination in indexer-agent

Verifiable Extraction

This month, we have initiated the transformation of our research code for pre-merge Ethereum data into a production-ready system, continued work on our proof-of-concept (PoC) to verify post-merge Ethereum data using Firehose Beacon, and conducted further research to enhance VE’s versatility and utility within the ecosystem.

We have started planning the engineering work needed to transition our existing Header Accumulator, Flat Head, and Flat Files Decoder projects into a production-ready tool for verifying pre-merge Ethereum data stored in Firehose flat files. At the same time, we are researching Project Nozzle to ensure VE’s flexibility in potentially supporting it. In parallel, we are exploring methods to extend our verification chain to include on-chain events and prove their inclusion within execution layer data.

We are in the process of finalizing our PoC to extend VE for verifying post-merge Ethereum data. Pinax’s Firehose Beacon implementation has been instrumental in obtaining the necessary Consensus Layer data for testing, and we are actively integrating it into our workflow. While we are working towards completing the PoCs for post-merge Ethereum data soon, we expect to transition into a hybrid research-engineering phase, during which we will address edge cases introduced by various Ethereum upgrades and the evolving context of data inclusion verification.

AI

We are grateful to announce that our recent work and paper on synthetic data generation has been accepted at the 3rd Table Representation Learning Workshop @ NeurIPS 2024. We look forward to the public release of our research and resources. From our latest round of experiments within the in-domain setting of the Spider benchmark, we have seen improved results from models trained on synthetic data generated using our methodology. While we have achieved Execution Accuracy exceeding that of manually generated data, when utilizing the expanded Test Suite Execution Accuracy metric, we still see degraded performance compared to manually generated data.

During this upcoming month, we will be increasingly focused on identifying a path to implementing a natural language interface to data on The Graph. We plan to outline an incremental process to bring this to market, seeking to identify the highest-impact features as we build toward our idealized end state. Outside of Text-to-SQL, we intend to devote resources to investigating the potential applications of machine learning to assist the delegation process. This latter focus, delegation optimization, results from discussions held at the recent core developer offsite.

5 Likes