Semiotic Labs April 2024 Update

:woman_astronaut: Summary

We’ve completed multiple action items for our TAP Micropayments, including finishing reviews and running end-to-end tests on the Testnet, which is now successfully serving queries. We’ve also introduced an actor model in the TAP-agent and launched TAP-core 1.0. Additionally, we developed the Verifiable Firehose, a tool that cryptographically verifies Ethereum data, which is ready for a public release soon. In data services, our Subgraph:SQL data service is moving towards a private beta, having integrated substantial feedback that led to UX enhancements and bug fixes. Concurrently, our efforts in LLM Fine-Tuning & NL–SQL Data Science have seen us reduce the performance gap between different data models, preparing a detailed report on our findings and methodologies.

:tada: Looking back (what was delivered)

TAP Micropayments

Milestone 1: Parallel testing on Testnet (target end: end of April)

  • Action item 1: Finish review of indexer-agent PR (#869) (E&N) (WIP!)

    • New improvements as separate PRs (e.g., locally indexing TAP subgraph)
  • Action item 2: Running the end-to-end tests on Testnet (Arbitrum Sepolia) (InfraDAO)

    • Discord channel setup (#scalar-tap), instructions shared with InfraDAO (Alexis) :white_check_mark:
    • Generating queries on testnet (InfraDAO to work with Theo) :white_check_mark:
      • InfraDAO (Vincent | Data Nexus :fire:) doing their “regular stuff” (WIP)
      • Apr 17: “And we’re successfully serving queries, thank you!”
      • Plan: “Two weeks of testing, once up and running”
  • Action item 3: Running systematic tests on local-network and Testnet (Semiotic)

    • Run test indexer (Carlos, supported by Alexis) :white_check_mark:
    • Testing, testing, testing (Carlos) (WIP)
  • Action item(s) 4: “Other improvements”

    • Implement actor model in TAP-agent (Gustavo) :white_check_mark:
    • Release of TAP-core 1.0 (Gustavo) :white_check_mark:
    • Prepare and merge PR for local-network (WIP)
  • Action item(s) 5: “docs”

    • Dev update during the IOH meeting (March 5th, Alexis) (recording) :white_check_mark:

Verifiable Firehose

We have developed a standalone tool that enables the cryptographic verification of Ethereum data stored in Firehose flat files; currently, this tool only supports the verification of pre-merge data. For example, this tool could “warp sync” a Firehose node by downloading flat files from peers and verifying with the tool; new Indexers could sync their Firehose node in days vs. the weeks/months it currently takes. We have shared a version of our standalone flat file verification tool with other core developers and a few Indexers for testing and feedback, which has led to several improvements. We plan to publicly release and market this as a standalone tool, which will allow us to gauge demand for a potential data service based on it.

Subgraph:SQL Data Service

We’ve made several significant steps toward launching the private beta in the past month. We have updated the gateway code for SQL and deployed it, though it is currently unused. We also managed to test our gateway modifications using the local-network tool developed by the TAP team. Our final remaining task on the gateway side is to enable user<>gateway payments. Simultaneously, we’ve been receiving good feedback from our internal testers. Much of this has translated into UX improvements. We’ve continued adding subgraphs to the internal testing phase, which have unearthed a few bugs we’ve since fixed. We’ve worked on many other things for SQL, from a website to an ISA.

LLM Fine-Tuning & NL–SQL Data Science

This past month, we continued our work towards reducing the gap in performance between models trained on synthetically and manually generated data. In addition to adopting a new training framework, we began scaling model size to gain insight into performance on more complex questions/queries. This included additional investigations into the impact of training data distribution on model performance, focusing primarily on query and database complexity. In support of prior work, we drafted a full technical write-up of experiments and methodologies.

:rocket: Looking ahead (upcoming priorities)

TAP Micropayments

  • Milestone 1: Finish testing on Testnet (Arbitrum Sepolia)

    • Action item 3: Running systematic tests on local-network and Testnet (Semiotic)
      • Tests with failure modes (collecting malicious RAVs, incorrect signatures, no allocation, etc.)
      • Test specification (TODO)
    • Action item(s) 4: “Other improvements”
      • Migrate TAP-Aggregator to gRPC (improvement, breaking change) (postpone?)
    • Action item(s) 5: “docs”
      • Documentation (polish, update, Indexer Migration Guide?)
      • Inform GraphOps to add launchpad plus docs, ping StakeSquid (not a blocker)
  • Milestone 2: Deployment to production on Arbitrum One Mainnet (May 2024)

  • Plan: Progressive Rollout (testnet → mainnet)

  • Smart contracts (TAP and Escrow) deployed :white_check_mark:

  • Escrow subgraph deployed :white_check_mark:

  • Gateway with TAP components up and running

  • Workshop during the Indexer Office Hours

  • (Announcements, blog posts, tweets, marketing, …)

Verifiable Firehose

Next, we are designing a verifiable data service for The Graph protocol. The data service can respond to queries like “Give me all the swap events emitted from the Uniswap v2 contract from block x to block y.” The data service is verifiable, meaning Indexers will be able to compute a proof that the results provided came from the blockchain and that no data were omitted from the range requested. This proof will be efficiently verifiable and on-chain. Initially, this data service will focus on providing only queries for events or transactions, i.e., blockchain committed data; trace data are not verifiable at this time and will not be included in v1 of this data service. Additionally, note that there are no data transformations; this is intentional. The idea is that this data will be input to other applications that can perform verifiable transformations, e.g., an enshrined coprocessor, or by one of the other external coprocessing efforts like Axiom.

Subgraph:SQL Data Service

In this upcoming month, our biggest goal is to clear our remaining tasks and launch the SQL private beta. In this phase, we’ll onboard select Indexers and Consumers to help us identify and improve further before we later launch the public beta. We’ll also start scoping the next iteration of SQL on The Graph, enabling long-running queries and a most transparent pricing system for Indexers and Consumers.

LLM Fine-Tuning & NL–SQL Data Science

This month, we plan to continue investigating various downsampling and filtering techniques to achieve data parity across all difficulty categories. This will include scaling experiments and data generation. In support of these efforts, we will continue to refine our tooling and reporting methods to reduce the time of experiments and time to reporting.


Sam will present at Coinbase’s Machine Learning & Blockchain Research Summit.