Semiotic Labs March 2024 Update

:woman_astronaut: Summary

Last month, for TAP, we focused on end-to-end integration testing on the local network, resolving numerous minor inconsistencies and culminating in successful RAV redeems. Despite these successes, improvements are still needed, particularly in logging and monitoring. We also contributed to notable PRs, including making TAP accounting per sender and fixing various indexer-rs and TAP core issues.

We finalized a Verifiable Firehose release. We concentrated on internal testing, bug fixes, and feature additions on the SQL Data Service and LLM Fine-Tuning & NL–SQL Data Science fronts. And we presented our position on zk-coprocessors and The Graph x AI at ETHDenver.

:tada: Looking back (what was delivered)

TAP Micropayments

Last month focused on integration testing everything end-to-end on the local network. Many minor inconsistencies were found and addressed, culminating with successful RAV redeems at the end of this monthly review cycle. There are still rough edges to address and the need to improve logging and monitoring.

Notable PRs (merged)

  • Indexer-rs:

    • Making TAP accounting per sender instead of per (sender, allocation) #113
    • Fixing indexer-service public info API #120, #121
    • Fixing receipts and RAVs DB serialization #122, #123, #124
    • Add a wait before RAV request error retry #130
  • TAP core:

    • (aggregator) Add support for multiple receipt keys #211
    • Fix RAV struct keys names to camelCase (to match contracts) #212, #213

Verifiable Firehose

This past month, we finished a Verifiable Firehose release and are currently testing and getting feedback by running the tool on our Firehose node. Just a reminder, this tool verifies that the transactions and receipts stored in Flat Files are a complete and accurate archive of the canonical Ethereum blockchain, enabling Indexers to “warp sync” their Firehose node by downloading and verifying flat files from other nodes, enabling a new Firehose Indexer to come online in hours rather than weeks or months.

We also presented our initial work on zk-coprocessors and how we see them being used in The Graph at EthDenver, video here: https://twitter.com/EthereumDenver/status/1763692686496690514

SQL Data Service

Last month, on the SQL Data Service team, we focused on internal testing and bug fixes of Stage 1, as described in our post last month. We also added new features to the studio, including SQL autocompletion with subgraph tables and columns and the ability to search for subgraphs by name. We also created a custom gsql filetype with various metadata, which makes it easier to collaborate on queries. In anticipation of the soon-to-come private beta, we also began to have discussions with potential launch partners.

LLM Fine-Tuning & NL–SQL Data Science

In the past month, we have focused on expanding our investigation into the optimal selection of query structures for data generation, which we call “seed” selection. We have continued researching the impact of various data generation strategies for fine-tuning, explicitly using the T5 model family. Our ongoing literature review regarding nl2sql has been centered on data generation methods, aiming to establish benchmarks for the process. Additionally, we have continuously refactored and refined our repository and tooling (MLOps, versioning, etc.)

AI Service

We participated in an ETH Denver Crypto x AI panel, where we shared some views on where The Graph fits into the decentralized AI stack. You can watch the panel recording here: https://twitter.com/EthereumDenver/status/1763678961500815360

:rocket: Looking ahead (upcoming priorities)

TAP Micropayments

Once the last few bugs are addressed, we will start delivering the new indexer payment stack to InfraDAO for testing on testnet. This is expected to happen within a week from now. We are sure InfraDAO will find many unforeseen issues that will keep us busy for the month. Simultaneously, we will work on test scenarios for the testnet (such as checking that there are no holes in the receipts accounting network-wide, stress testing, DoS attacks, error injections, etc.).

Verifiable Firehose

Next month, we will incorporate any feedback on the Verifiable Firehose release and then begin socializing the tool among Indexers. We will also contact Ethereum client teams for a follow-up on our EIP-4444 proposal.

SQL Data Service

Next month, we plan to continue with Stage 2. This means working on an SQL gateway and an SQL indexer service. We will also work on creating a website for API keys and payment management. We’ll work closely with another indexer to bring them online this month. We’ll use what we learn from this partnership to iron out any issues before we invite more Indexers to enable SQL.

LLM Fine-Tuning & NL–SQL Data Science

We plan to establish a synthetic data generation benchmarking system to compare various methodologies. We aim to expand our model evaluation system to keep abreast of new benchmarks and gain deeper insights into model performance. Additionally, we intend to broaden our research into the impact of topic selection on data generation and model performance, with the goal of better understanding methods of injecting external knowledge and reasoning into our models.

AI Service

We will finish our draft for The Graph x AI whitepaper. This whitepaper maps out how The Graph will play a critical role in decentralized AI.

8 Likes

Thanks for your update - so many good things!. Any chance Semiotic or @Sam will be in attendance here? GTC 2024: #1 AI Conference

4 Likes

I really enjoyed the ETH Denver presentations. Exciting times.

5 Likes