Semiotic June 2024 Update

:woman_astronaut: Summary

We tested the TAP Indexer components internally and with InfraDAO, finding and fixing numerous bugs. We planned a testing suite to ensure TAP components behave under adverse conditions. We also started working on “verifiable flat files” within The Graph protocol, building on our flat-head tool, to incentivize accurate historical data storage by Indexers. Additionally, we developed a demonstration combining verifiable data with computation to create a trustless on-chain oracle. We launched (and will soon end) the Subgraph:SQL private beta to focus on AI and verifiability. We released the AI Services Whitepaper, launched a competition with Agentc, and engaged stakeholders to ensure AI Services’ success. Our synthetic data generation investigations for text-to-SQL showed performance improvements with larger datasets, leading to a framework for evaluating synthetic data generation methods.

:tada: Looking back (what was delivered)

TAP Micropayments

We have been testing the TAP Indexer components, both internally and with InfraDAO. We have found numerous bugs (as expected), so we are continuing to fix things. In particular:

  • InfraDAO members Vincent | Data Nexus and Vince | Nodeify successfully achieved the TAP “happy path”, ie. everybody plays nice, and managed to redeem query fees through TAP.
  • We planned a whole testing/torture suite, to make sure that the TAP components behave as expected, even when the Sender (ie. Gateway) doesn’t play nice, or various other things break.

Verifiability

We have also started working on an effort to enshrine “verifiable flat files” within The Graph protocol. This builds on our earlier work, which resulted in flat-head, a standalone tool that Indexers can use to verify that the blockchain data contained in a set of flat files is a complete snapshot of Ethereum’s canonical history. Effectively, this effort aims to provide incentives and cryptographic mechanisms that enable Indexers to prove and get paid for storing accurate historical data.

We have also developed a demonstration that showcases how verifiable data from The Graph can be combined with verifiable computation to provide a trustless, on-chain oracle. The demonstration uses swap events from a USDC/ETH Uniswap v3 pool obtained via Substreams. We use the data to calculate the 24-hour volatility of the UDSC/ETH price and use a SNARK to prove that the result is correct. This result, along with the proof, is posted on-chain and is used as an oracle for a Uniswap v4 hook to calculate the dynamic fee for a Uniswap v4 pool.

Subgraph:SQL Data Service

We launched the Subgraph:SQL private beta, but we’ll actually be shutting it down fairly soon. Semiotic’s unique expertise is in AI and verifiability. We’re going to focus on that. As a result, Semiotic is closing its Subgraph:SQL chapter (other emerging core dev efforts will lead to superior SQL capabilities in The Graph).

AI Services

We released the AI Services Whitepaper and a related GRTiQ special, which outlines our plan to expand The Graph to support decentralized AI applications. We also launched a competition with Agentc to show the types of things the AI services will enable. We’ve also been having conversations with everyone from Indexers to dApp developers to understand how we can make the AI Services successful.

LLM Fine-Tuning & NL–SQL Data Science

We finalized our most recent round of investigations into synthetic data generation for the task of text-to-SQL. From our investigations into dataset size, we saw consistent results across our experiments that model performance continues to increase with the size of synthetic training data. Evaluating the quality of our synthetic data in the context of specific query difficulty categories, we observed that in all cases our synthetic data boosted overall performance on our benchmark, while in some instances we saw gaps in performance on specific difficulty categories. Closing this gap on specific categories is a potential area of future work. With a focus on generating diversity in our training data, we evaluated our methodology across several key metrics, and observed that we are able to generate diverse data that still follows our desired distribution (in this context, our desired distribution being defined by query difficulty). We continued to review the literature, and formalized a general framework through which to categorize and review synthetic data generation methods for the task of text-to-SQL. This was in support of a write-up of our methodology and experiments.

:rocket: Looking ahead (upcoming priorities)

TAP Micropayments

We will continue testing, and hopefully we’ll have less things to fix! We are also looking forward to the mainnet Gateway side of the TAP components to be deployed, so that we can start testing there with infraDAO.

Verifiability

Work with other core developers to create a roadmap to to enshrine “verifiable flat files” within The Graph protocol.

AI Services

We are in the process of creating our plan for the Inference Service. We will start developing the Inference Service this month, and we will add related content up to the Semiotic Blog, covering everything from AI basics to more advanced tutorials and demos.

LLM Fine-Tuning & NL–SQL Data Science

We plan to release our synthetic datasets, model weights, and open source associated code for data generation.

3 Likes