Semiotic Labs December 2023 Update

:woman_astronaut: Summary

The TAP Escrow subgraph was merged in November, and we readied the TAP Agent for testing. At Devconnect, we learned how our Verifiable Firehose solution could be useful as an EIP-4444 solution for Nethermind. We also presented the current status of the SQL data service at Devconnect. Looking ahead, we are focused on deploying TAP smart contracts on the Sepolia network, delivering our EIP-4444 solution solution for pre-merge data, and continuing to refine our SQL and AI data service.

:tada: Looking back (what was delivered)

TAP Scalar/payments

Progress was made on the implementation of several TAP components. Namely, TAP Escrow subgraph was merged, and the team successfully readied the TAP Agent for integration testing, showcasing effective teamwork with E&N and GraphOps. Hope started working on integrating TAP with the indexer-agent, so that it can redeem TAP RAVs on allocation closure. We started setting up a testnet indexer specifically for testing the new indexer components. Concurrently, integration tests between the aggregator/gateway and indexer commenced, but assistance is still needed for deploying Gateway components (Receipt Aggregator, Escrow Manager) on the Testnet.

TAP PRs and Documentation (collaboration with GraphOps and E&N):

  • Semiotic: TAP Agent working (ready for integration testing) - teamwork! :slight_smile:
  • Semiotic: TAP Core [PR 189, PR 190, PR 191] β†’ Released v0.7.0
  • Semiotic: TAP Escrow subgraph [PR 13]
    • E&N smart contract PR (unassignedDeposits)
  • Semiotic/E&N: TAP smart contracts [repo] deployed on Sepolia [deployment plan] (Tomas β†’ Bryan)
  • E&N: audit for unassignedDeposits received (2 issues, to be discussed)
  • E&N: Starting the integration tests between the aggregator/gateway and indexer
    • Deploying Gateway bits (Receipt Aggregator, Escrow Manager) on Testnet (Theo)

Verifiability/data correctness

Progress this month has focused primarily on our proposal to use The Graph as a solution for EIP-4444, published on ethresear.ch, here. We showcased this proposal through a presentation (slides) at DevCon Datapalooza (video) and the Fenbushi Capital Research house. Collaborative discussions with the Nethermind client team and Portal Network have been fruitful, particularly with the Nethermind team expressing interest in our proposal contingent on our ability to provide pre-merge data in ERA1 file format. With this motivation, we are finalizing our work on Verifiable Flat Hash (VFH) for pre-merge data and aim to deliver a Proof of Concept (PoC) for ERA1 Substreams data service by the end of December. This work includes the implementation of a Header Accumulator for flat files essential for pre-merge VFH, which we are currently integrating with a Flat File Decoder; for more details see our ethresear.ch proposal. Additionally, we have developed a PoC for VFH with post-merge data, utilizing sync committee signatures for flat file verification. We will build on this PoC to deliver a VFH solution for post-merge data once the pre-merge solutions, and the ERA1 Substreams data service PoC are completed.

SQL data service

Last month, Sam presented at Devconnect on our current status of bringing high-performance SQL queries to The Graph (YouTube). We planned to test sending SQL queries to StreamingFast or Pinax in November, but updates to ClickHouse caused a delay. We also had interviews with two analytics product teams currently building on The Graph (DappLooker) or would like to build on The Graph (Crellos). Our goal in these conversions was to understand what a good SQL product looks like to them.

AI data service

  • Evaluated the ability of various open-source models to complete natural language to SQL questions.
  • Began the data collection, filtering, and labeling process of converting AgentC user questions to a training dataset (Ongoing: ~1000 Queries evaluated, 350+ Filtered for the initial training set, 89 Handpicked for Testing).
  • Established initial evaluation suite and tooling focusing on modularity to define distributions easily.
  • Implemented initial fine-tuning pipeline for nl2sql, focused on being modular with HuggingFaceπŸ€— as the model repo.
  • Defining key metrics for data distribution: Question similarity, AST similarity, SQL difficulty, and Question-AST connection.

:tada: Looking ahead

TAP Scalar/payments

The focus will shift to deploying TAP smart contracts on the Sepolia network, with Tomas leading and Bryan assisting. Carlos and Theo will collaborate to deploy the Escrow subgraph, updating it for the Sepolia network. We hope to have a working Gateway and Indexer with the new components and be able to test the entire payment flow end-to-end on Testnet.

Verifiability/data correctness

We plan to deliver our VFH solution for pre-merge data and a proof-of-concept for ERA1 Substreams data service.

SQL data service

Our primary goal for November is to send SQL queries to StreamingFast and Pinax. Semiotic is meeting this week with StreamingFast to discuss Substreams:SQL, which will become the standard way to specify what and how blockchain data gets indexed to ClickHouse. Substreams:SQL will also support DBT, allowing us to create efficient aggregations over data. For example, users can request token prices across multiple chains using a single efficient SQL query.

AI data service

Next month

  • Finalize research around data distribution and implement tooling to facilitate efficient distributions of training data, effective evaluation based on clustered groups, and synthetic data generation to meet model needs.
  • Refactor fine-tuning pipeline to take advantage of training data improvements and improve monitoring.
  • Begin evaluating open-source inference engines and roadmap for integration with The Graph :t_rex:
6 Likes