The Graph Core Dev Call #26

Ethan · December 3, 2023, 10:43pm

Summary

Core Dev Team AMA (02:31)
- Introduction to AMA session on The Graph’s subreddit
- Discussion on The Graph’s new roadmap
- Q&A with core developer teams
GIP Updates (03:05)
- Update on L2 Rewards on Arbitrum
- Graph Improvement Proposals: GIP-0041 and GIP-0059
- GIP-0056 – Staking Contract Alteration
- GIP-0062 – Delegation Parameter Adjustments and Protocol Simplification
- GIP-0061 – Improving the GIP Process
- Updates on GIP-0058 and GIP-0063
Bringing SQL Queries to The Graph | Semiotic Labs (11:30)
- Strengths of GraphQL and Case for SQL in The Graph
- Mechanism for Enabling SQL Queries in The Graph
- Analytical Solution and ETL Process
- ClickHouse and Columnar Database Advantages
- Future Developments and Applications
- SQL Data Service Timeline
How DBT (Data Build Tool) Takes SQL to the Next Level (22:20)
- DBT and SQL in Analytics
- DBT’s Integration and Functionality
- DBT Models and Development Process
- Data Lineage and Modularization in DBT
- DBT’s Open Source Nature and Developer Ecosystem
The Backend: Infrastructure for SQL and DBT Integration | feat. StreamingFast (30:30)
- Backend Challenges and Solutions
- Integration of Firehose, Substreams, and SQL Sync
- Composable Architecture
- Substreams: SQL Deployable Service
- Future Developments and Network Integration
- Questions and Answers

As always, the community was encouraged to ask questions and participate actively in discussions.

Core Dev Team AMA (02:31)

The Graph Foundation would like to remind all community members that an “Ask Me Anything” (AMA) session was held on /r/thegraph subreddit. This event featured the core developer temas answering questions on The Graph’s new roadmap! The AMA has attracted a variety of insightful questions, allowing for a broad discussion on the subject.

The session offered a platform for community members to gain deeper insights into The Graph’s advancements and future, directly from the developers involved.

GIP Updates (03:05)

Update on L2 Rewards on Arbitrum

The update began with the announcement that L2 rewards on Arbitrum have been increased to 95%. This change already took effect. A further increase to 100% is planned, dependent on the transition of at least 90% of active and queried subgraphs to L2, as outlined in GIP-0052. A revisit to the 90% benchmark may occur.

Graph Improvement Proposals GIP-0041 and GIP-0059

GIP-0041 and GIP-0059, although distinct, were bundled into a single Council proposal. The proposals previously faced a delay due to an expired vote but are now set for reconsideration. GIP-0041 proposes updates to the dispute manager contract to store dispute status on-chain, enhancing transparency in dispute resolutions. GIP-0059 proposes to remove the owner tax levied on subgraph owners for upgrades, maintaining only the curation tax. Both proposals have received positive verbal consensus from the Council and are awaiting formal approval.

GIP-0056 – Staking Contract Alteration

GIP-0056 intends to remove a specific restriction in the staking contract related to who can inject payments of query fees into the protocol. The proposed change, which has undergone security audits, would allow any address to perform this function, moving the network towards full permissionlessness. The Council has verbally agreed to this proposal, and it is moving towards formal approval.

GIP-0062 – Delegation Parameter Adjustments and Protocol Simplification

GIP-0062 addresses delegation parameters, specifically focusing on a cooldown field that Indexers set for changing indexing rewards and query fee cut. The proposal, having received informal Council approval, aims to resolve a UI bug and remove an underused feature. It reflects a broader goal of simplifying the protocol and reducing potential risks and complexities.

GIP-0061 – Improving the GIP Process

GIP-0061 seeks to streamline and clarify the GIP process. The Council provided feedback for further refinement before a formal vote. This proposal is designed to make the GIP process more efficient and transparent, reflecting existing practices.

GIP-0058 and GIP-0063

An update on GIP-0058 will be posted to the forum soon which planned to replace bonding curves with indexing fees. GIP-0063 will likely be withdrawn which addressed optimizing query fees sent to Indexers; more information will be shared on the forum soon.

Bringing SQL Queries to The Graph | Semiotic Labs (11:30)

Strengths of GraphQL and Case for SQL in The Graph

GraphQL allows developers to build custom API endpoints without needing to maintain a backend, as this is managed by Indexers. It’s particularly effective for fixed queries where developers know their data needs in advance. While GraphQL will continue to be supported, the integration of SQL is being pursued to cater to data science applications, offering enhanced speed and flexibility.

Mechanism for Enabling SQL Queries in The Graph

The integration of SQL queries into The Graph’s ecosystem involves several components. The Firehose system extracts blockchain data into flat files, which are then processed and filtered using Substreams. These are fed into ClickHouse, a combination deemed effective for analytical data processing. This setup allows for analytical operations like summations and averages that are not readily available in GraphQL. The Graph is enhancing its support for high-performance SQL, anticipating future Indexers specializing in either GraphQL or SQL.

Analytical Solution and ETL Process

The goal is to offer an analytical solution that complements, rather than overlaps with, GraphQL. This is particularly relevant for data scientists and business analysts who require flexible data structures. The ETL (Extract, Transform, Load) process involves extracting data with Firehose, shaping it with Substreams, and storing it in ClickHouse for SQL-based analytics. This process enables complex data transformations and aggregations that are less feasible with GraphQL.

ClickHouse and Columnar Database Advantages

ClickHouse, a columnar database, is highlighted for its suitability for online analytical processing (OLAP). Unlike traditional row-based databases like Postgres, columnar databases offer faster query speeds and higher compression ratios. This is particularly beneficial for aggregations and transformations. In tests, ClickHouse outperformed Postgres in query speed by nearly tenfold, even without optimization.

Future Developments and Applications

Looking ahead, the integration of SQL within The Graph is expected to open new avenues in analytics, market research, and anti-fraud research. Semiotic Labs showed a demonstration video of their new project “agentc.xyz”, which showcased the capability to convert Ethereum tax-related natural language queries into SQL queries, delivering results in seconds. This development indicates a significant step towards more dynamic, on-demand querying capabilities within The Graph ecosystem.

SQL Data Service Timeline

Define the SQL API (Prototype Done)
Integrate DBTs into Substreams (Work In Progress)
Test Deployment with Pinax and Streaming Fast (Q4 2023)
Integrate with Gateway & Billing (Q1 2024)

How DBT (Data Build Tool) Takes SQL to the Next Level (22:20)

Utilizing DBT (Data Build Tool) with SQL in The Graph significantly enhances its data handling capabilities. While SQL alone allows for complex and dynamic querying, DBT introduces robust data transformation and modeling, aligning The Graph with modern data engineering practices. DBT’s key contributions include facilitating complex data transformations, enabling the modularization of SQL queries (like Substreams) for efficient management, and providing a structured framework for development, testing, and documentation. This integration will not only elevate the querying power of The Graph but also enrich the ecosystem with advanced tools and community-driven practices, making it more adept for sophisticated data analytics and science applications.

DBT and SQL in Analytics

SQL has traditionally lacked robust development tools, a gap recently filled by DBT. DBT has become popular for its comprehensive suite of tools that facilitate development, testing, documentation, visualization, lineage, and deployment for SQL developers. It elevates SQL development to a new level of efficiency and effectiveness.

DBT’s Integration and Functionality

DBT integrates with various data platforms such as Snowflake, BigQuery, Databricks, ClickHouse, and Postgres. It enables the transformation of raw data into formatted, aggregated datasets that can be used in subsequent stages of data pipelines. DBT’s functionality encompasses a wide range of processes, from data transformation to aggregation, enhancing the versatility of SQL in data analysis.

DBT Models and Development Process

A DBT program, referred to as a ‘DBT model’, is essentially a set of SQL queries combined with DBT-specific features. These include macros, similar to those in C++, which allow for variable usage and references to other data models. DBT models also include configurations for materializing tables and other output settings. The development process in DBT is streamlined through these features, allowing for more complex and interconnected SQL queries.

Data Lineage and Modularization in DBT

One of the key strengths of DBT is its ability to break down large SQL queries into smaller, interrelated data models, akin to the modular approach in Substreams. This modularization facilitates data lineage visualization, showing the interconnectivity between different data models. This approach mirrors the philosophy of Substreams, where modules are dependent on each other and can be reused or cached for downstream consumption.

DBT’s Open Source Nature and Developer Ecosystem

DBT is completely open source and boasts a significant developer ecosystem, particularly in the Web 2.0 domain. This ecosystem includes extensive resources, such as courses and documentation available on DBT’s website, which will empower future developers utilizing The Graph.

The Backend: Infrastructure for SQL and DBT Integration | feat. StreamingFast (30:30)

Backend Challenges and Solutions

Developers using The Graph’s technologies face challenges in real-time API access and scaling front-end applications. They often find it difficult to deploy and maintain infrastructure and learn new technologies like Substreams. To address these issues, the goal is to create deployable units, simplifying the deployment of both front-end and backend components. This approach minimizes the complexities related to scaling and hosting.

Integration of Firehose, Substreams, and SQL Sync

The stack begins with Firehose and Substreams, followed by SQL sync to transfer data into databases like Postgres or ClickHouse. DBT is leveraged for data transformation, offering an alternative to Substreams for data manipulation. This setup allows direct querying of the data via Graph QL, REST API, or direct SQL access, facilitating easier data consumption for analytics and front-end applications.

Composable Architecture

The architecture is highly composable, with different configurations to suit various needs. For instance, one setup uses only ClickHouse for data and queries it with SQL, enabling batch and real-time transformations with DBT. Another setup focuses on real-time applications, using Postgres to facilitate scaling and data joining (both on-chain and off-chain). This approach enhances the flexibility and applicability of The Graph’s ecosystem to diverse use cases.

Substreams: SQL Deployable Service

In the demonstration, Streaming Fast showcased a deployable unit using Substreams, which integrates SQL definitions directly into the Substreams manifest. This process involves mapping events to a database output (DB out), which is designed to feed into an SQL database. A key feature of this system is the ability to package and share the entire schema as a .spkg file, representing the deployable unit’s version. Upon deployment, the system provides details like the deployment ID and the status of the last processed block.

Two examples were presented: one involving the Bored Ape contract data with a Postgres engine and another featuring CryptoPunks data in a ClickHouse environment, demonstrating the versatility of the system with different database engines and front-end capabilities. The demo concluded by showing how simple SQL queries can be executed against these deployed systems, illustrating the potential for direct, efficient data access and manipulation.

StreamingFast is actively seeking feedback on this new solution and encourage users to try their tutorial and share their experiences. This feedback will be instrumental in refining their approach to create an accessible and efficient development workflow within The Graph’s ecosystem. Please visit their Discord to provide feedback!

Future Developments and Network Integration

Future developments include expanding language support in Substreams, introducing new data services (like Firehose for Bitcoin), and considering different endpoints (like IPFS). The aim is to make these data services deployable across The Graph network, providing more options for resource usage and data privacy.

Questions and Answers

Can all of this be deployed to local infrastructure, or is Streaming Fast’s infrastructure necessary?

Everything can be run locally for a development environment using Docker compose, it works fully locally.

How does DBT work with this setup?

DBT gets deployed in the package, running on a frequency specified by the developer. In a development environment, DBT typically doesn’t run automatically, allowing developers to iterate on their models. DBT can be used for schema definition and periodic data updates in production.

Is it possible to deploy some units to a cloud provider like Streaming Fast and have a database hosted elsewhere or vice versa?

This is a use case that requires further discussion. The goal is to provide flexibility to address various needs, like specifying a Postgres access to a different location. The first iteration is focused on integrating SQL into the network.

How will this fit within the World of Data Services? How will it fit into the economics of The Graph and how the network will support it?

The focus is on wiring these deployable units into The Graph Network for broader offering and query fee generation. Efforts are also being made to tie this to the token economics of The Graph network. The aim is to provide tools and support to help Indexers run these services in the future, with a reference implementation always available for development.

Stay Tuned!

Join us next month for Core Devs Call #27!

Keep up to date by joining discussions in the forum, following The Graph on X/Twitter or joining the Discord server!

Topic		Replies	Views
The Graph Core Dev Call #25 Community Calls core-devs-call	2	1388	October 28, 2023
The Graph Core Dev Call #27 Community Calls core-devs-call	6	1379	February 13, 2024
The Graph Core Dev Call #28 Community Calls core-devs-call	1	1147	February 24, 2024