The Art of Data Verification in Web3: The Importance of Data Inconsistencies and The Plan to Fix Them

Achieving integrity of web3 data hinges on independent verifiability and reproducibility of computation, rather than depending on centralized monopolies of information. The Graph Network is at the forefront of this movement with hundreds of Indexers ensuring the accuracy of the data they provide.

This decentralized approach is necessary for maintaining data accuracy, but it is not without its challenges.

In the current state of the network, it’s possible to have situations where different Indexers return different results for the same queries. This triggers an arbitration process for understanding the source of the discrepancy. For the time being, this is a fairly manual process but the network has several advantages for doing this work. When discrepancies arise, it is possible to take advantage of a redundant network of Indexers and cross-check against multiple data points, effectively highlighting the source of the inconsistency.

One of the pivotal mechanisms that underpin this process is the Proof of Indexing (PoI), an integral component of The Graph’s framework. The PoI is a cryptographic proof that an Indexer submits to show that it has accurately processed and stored the data from a subgraph. It serves as an assurance that the data processed by the Indexer is correct and reliable. In addition, every query response comes with a signed attestation by the Indexer. If incorrect data is served by an Indexer, the latest PoI and signed attestation are used to examine the contested query, possibly leading to an Indexer being slashed.

Contrast this with conventional, centralized solutions where you typically pay a premium for data services. In such models, inaccuracies in data often go unnoticed until pointed out by users or accidentally discovered by the development team. The lack of a broad data consensus makes it challenging to spot and rectify errors, further emphasizing the importance and necessity of multiple sources of truth as championed by The Graph.

The Role of Indexers and the Variability of Data When Forming Consensus

Indexers compete to serve data queries in public markets, employing a diverse array of hardware, tooling, and optimizations to improve efficiency. This competition not only fosters a robust environment for innovation but also adds a layer of resiliency and censorship resistance to the network.

The Graph’s approach will eventually lead to the most trustworthy system for accessing verifiable information. However, in the early stages, the network relies on an Arbitration Committee, and there can occasionally be situations that lead to short term data inconsistencies. Processes are being refined for detecting inconsistencies, launching disputes, providing arbitration, and slashing faulty Indexers. While the visibility of these inconsistencies highlights the system’s robustness in retrieving data, they can also be a source of frustration for developers.

Core developers focused on The Graph are aware of these challenges and are actively working towards solutions to mitigate these issues.

How The Graph is Working on Fixing The Problem: Graphix

Recognizing the challenges that data inconsistency presents, The Graph community has been diligently working to address these issues, all while maintaining its commitment to transparent and accurate data. One such effort is the development of Graphix, a cross-checking system for indexing and query results. Graphix is a software built by Edge and Node, that is designed to detect inconsistencies in indexing results through the use of PoIs, and drastically speeding up the time to triage, root cause issues, and drive toward resolution.

Imagine a future where conflicting data does not necessitate manual intervention. As Ethereum and web3 gain further adoption, such independent validation of data becomes an essential component of navigating the internet.

The development of Graphix signifies a substantial stride in making this future a reality. Graphix will greatly improve time to resolution for data inconsistencies, bolstering the reliability and trustworthiness of The Graph Network.

Final thoughts

The shift towards a decentralized data architecture represented by The Graph and web3 fundamentally transforms the way we interact with, validate, and trust data. This model leverages multiple Indexers to offer an ecosystem where truth is established by network consensus, thereby reducing dependence on a single source of “truth.” While in the early days, processes need to be refined for detecting and resolving data inconsistencies, this architecture provides a significantly improved solution to providing verifiable information without depending on centralized intermediaries.

Current engineering work includes improving the effectiveness of cryptoeconomic security in the network, while research engineering continues on the bleeding edge of cryptographic verifiability… Novel methods are being developed for cryptographic verifiability under the Verifiable Queries workstream, further bolstering data accuracy and reducing the need for economic actors to enforce security.

As a new era dawns, it is clear that the importance of independent validation in data accuracy cannot be overstated. The Graph and the decentralized network of Indexers play a pivotal role in this landscape, facilitating a move towards a new era of trust and empowerment in the web3 space. With these developments, we are witnessing a seismic shift from the traditional data consumption model to one that is transparent, accessible, and decentralized.