GRC-001: Graphcast - a Gossip Network for Indexers

Objective of This GRC

GraphOps first presented the idea of a Gossip Network for Indexers in late 2021, and following our Core Developer Grant in July, have dedicated our resources to making it a reality!

This GRC presents our approach for a Gossip Network for Indexers (called Graphcast). In sharing this post, we intend to solicit feedback from the wider community which we will use to inform the design decisions for this project. Please read the GRC in full and share your thoughts and feedback in the comments below.

High-Level Description

Graphcast will act as a decentralized, distributed peer-to-peer (P2P) communication tool that allows Indexers across the network to exchange information in real-time. Today, network participants coordinate with one another using the protocol by submitting on-chain transactions that update the shared global state in The Graph Network. These transactions cost gas, which makes some types of signaling or coordination between participants too expensive. In other words, sending signals to other network participants has a high cost that is determined by the cost of transacting on the Ethereum blockchain. Graphcast solves this problem.

Graphcast will act as an optional off-chain layer of infrastructure that Indexers can opt into independent from on-chain protocol operations. The cost of exchanging P2P messages is near zero, providing an ideal environment for participants to coordinate without concern for cost. It is worth noting that this comes with tradeoffs, most importantly that the integrity (i.e. the truth of) a message has no guarantees by default. Nevertheless, Graphcast aims to provide message validity guarantees (i.e. that the message is valid and signed by a known protocol participant).

We intend to release a Software Development Kit that allows developers to build Radios, which are gossip-powered applications that Indexers can run to serve a given purpose. In this GRC we also present a number of use cases for Graphcast, each of which may become a Radio. Our proof of concept is focused on the use case of real-time cross-checking of Indexer Proofs of Indexing (POIs).

Motivation

The main goal of Graphcast is to unlock new forms of healthy coordination among Indexers. Currently, if Indexers want to coordinate in the network they would need to send on-chain transactions to exchange messages. These costs create a lower-bound for the minimum value of a message or signal that is posted on-chain, which inevitably prices out some coordination that would otherwise be beneficial to the participants. Graphcast provides an off-chain P2P messaging layer that will address the cost aspect of this problem because messaging through the network is near-free. Of course, that comes with its own set of tradeoffs, and Graphcast should not be thought of as a replacement for existing on-chain coordination.

Developers building tooling for Indexers and other participants focus mostly on their application logic (and rightly so), which can lead to design decisions that aim at minimizing additional complexity. Since centralized command and control of an application is easy and efficient, that naturally leads to the centralization of the data in those applications. Graphcast provides a neutral messaging backend on top of ecosystem tooling developers can use to build their applications (called Radios). In other words, the aim is to give developers a construct that makes it much easier to build their ecosystem applications in a decentralized manner from the start. An additional bonus is that developers also won’t have to manage backend servers!

This approach to building ecosystem tooling will also democratize access to the underlying data that powers these applications, which will deter metadata monopolies that may emerge from tools using centralized messaging solutions that gain large-scale adoption. It also enables different applications to be built on top of the same underlying stream of gossip messages.

There are, of course, some considerations to this approach, which include:

  • No strong guarantees about the integrity (honesty) of messages passed in the network.
  • Difficulties in testing the Radios, since the Radio developer would need to spin up multiple instances to ensure their logic is behaving as expected across a network of Radios. We will provide a test harness (possibly in the form of a Dockerfile) that will help Radio developers with this process.
  • Lack of multi-language support (at least in the beginning).
  • Inspectability and visibility limitations. That can lead to difficulties in monitoring and debugging.

Use Cases

Graphcast will unlock tangible benefits for Indexers to coordinate with each other, as it will practically enable a whole new design space for low-cost coordination. Some examples of Radios include:

  • Conducting auctions and coordination for warp syncing subgraphs, substreams, and Firehose data from other Indexers.
  • Self-reporting on active query analytics, including subgraph request volumes, fee volumes, etc.
  • Self-reporting on indexing analytics, including subgraph indexing time, handler gas costs, indexing errors encountered, etc.
  • Self-reporting on stack information including graph-node version, Postgres version, Ethereum client version, etc.
  • Real-time cross-checking of subgraph data integrity, with active bail-out in the case of diverging from stake-weighted POI consensus.
  • AutoAgora cross-Indexer signals/negotiations for improved automated query pricing.

The potential use cases for Graphcast in the long run are vast and open to the imagination and creativity of the community. Radios could be focused on the coordination of other network participants, such as Curators or Delegators. For example, maybe a Curator would do subgraph discovery through a Radio built for that purpose, such as only curating on a subgraph once they know an Indexer can sync deployment up to the chain-head.

Technology Stack

After reviewing the landscape of P2P networking stacks, we plan on using Waku V2. Waku is built on libp2p (used by the Ethereum Beacon Chain) and prioritizes adaptive networking, node privacy, message unlinkability, and modularity as design goals.

We will start by implementing the Graphcast SDK, a TypeScript SDK that abstracts over Waku, provides identity resolution, and that will serve as a base client to Graphcast. It will expose a rich interface for building Radios.

Our perspective of the benefits of using Waku over alternative solutions are covered in the points below, but we welcome feedback and alternative approaches. All our code will be open-source.

This project will have many iterations, so design decisions being made for the first iteration should by no means limit exploring different ones in the future. Gossip messages are defined using Protobuf, providing a good basis for portability and SDK implementations in other languages if required.

Initially, we considered a few other options:

  • Building our solution on top of libp2p
    We could always leverage the underlying technology that Waku and most other gossip protocols are built on: libp2p. But that would add additional development overhead and having to “reinvent the wheel” when it comes to features that Waku provides out of the box like message signing and encryption, protection against network-level spam, and a level of abstraction over the raw protocol in general.
  • Using a solution like GunDB or OrbitDB
    We explored the idea of using a distributed multi-party database, in which peers would directly write into, but we decided that the Graphcast SDK should remain with an ephemeral approach and that, if needed, data storage should be handled by the Radio.
  • Other gossip clients built on top of libp2p
    Of course, Waku has alternatives that pretty much aim to accomplish the same thing. The reason we went with Waku instead of one of those is that Waku is the most “battle-tested” of them all and the development team (as well as the community) behind it is committed to further enhancing the product and making it the main solution of choice for messaging in web3.

Proof of Concept

We are currently iterating on a proof of concept release, intended to validate our approach before we release an MVP for others to try. The proof of concept is focused on a single Radio implementation: real-time POI cross-checking. The key requirement for an Indexer to earn indexing rewards is to submit a valid Proof of Indexing. The importance of valid POIs causes many Indexers to alert each other on subgraph health in community discussions.

The Radio works by aggregating normalized POIs from all participating Indexers and weighting them by indexer stake. If there is a mismatch between the local nPOI versus the nPOI that is backed by the most stake, then the Radio updates the cost model for the diverged subgraph deployment to an extremely high price to defer query traffic. This real-time view of aggregate stake-weighted POIs enables rapid detection of POI divergence. Learn more about how this works in its GitHub repo.

The proof of concept is split in two parts - the Gossip SDK implementation and the POI cross-checker Radio that is built on top of it.

The SDK is the core that abstracts all the necessary components of each Radio away from the user (the user being the Radio developer). That includes:

  • Connecting to Graphcast, e.g., other peers in the network.
  • Interactions with the Ethereum network and The Graph stack.
  • Resolving the sender identity with the help of a registry contract that matches the message sender to their on-chain identity.

We will be further developing the proof of concept during the next few weeks, in order to test the most critical pieces/mechanisms. Thereafter we will move on to developing a Minimal Viable Product (MVP) intended to be run by the wider community.

You can find more information about the proof of concept in its GitHub repo. It’s important to note that this proof of concept is not meant to run in production, but just to demonstrate how all of the concepts of Graphcast will work.

The proof of concept will be demonstrated during the Monthly Core Dev Call tomorrow (Sep 1). That demo will be recorded and we will share the recording in the comments of this post.

Risks and Challenges

Everything in software is a trade-off, and Graphcast is no different. There are some risks and challenges that need to be considered:

  • Data integrity by reputation
    Since sending messages on Graphcast is free, in particular situations some participants might see incentives to send false or misleading messages. This is why, depending on the nature of the Radio, a reputation system can be vital. That system could be in the form of local or shared reputation models, external or in-protocol economic security through slashing.
  • Versioning
    We need a strategy for versioning the base layer of Graphcast, whether it’s distributed as an SDK or a standalone application. Changes in the logic (both breaking and non-breaking) could cause issues in the coordination process. An automated release process would be the optimal solution, with a handy way to notify the users of a new release when they run their Radio, prompting them to update.
  • Dependency failure
    As with all software, Graphcast will not be built “from scratch. ”It will rely mainly on the Waku typescript SDK, but also on other libraries in the Node.js ecosystem, which naturally adds the risk of dependency failure. We can mitigate that risk by having regular CI (continuous integration) builds and dependency checks.

Next Steps

Here are our main objectives for the near future:

  • Collect feedback from Indexers, as well as the wider community.
  • Publish on-chain registry contract that will be used to connect a Graphcast participant’s identity to their on-chain identity in The Graph protocol.
  • Publish Graphcast SDK so that it can be used as a library (on npm, for instance) for the development of Radios.
  • Further develop the Graphcast SDK and the POI cross-checker Radio to get them to an MVP stage. You can view the progress on that on the Github page of the proof of concept.
  • Discuss the prospects and future of Graphcast with other Indexers during Indexer Office Hours (IOH).

Copyright Waiver

Copyright and related rights waived via CC0.

9 Likes

Folks from Streamr Network have proposed their messaging layer as an alternative to Waku: The Graph Indexer network real time communications

Looking forward to collaborating!

3 Likes

Graphcast will enable a simpler, more robust Graph Protocol.

For example, we currently have a problem caused by indexers allocating on a subgraph before indexing. Because indexers have not yet synced, they are unable to serve queries. Yet, their allocation deters other indexers from participating. Quality of service may be degraded as a result. One potential solution would require a PoI when opening an allocation - indicating an indexer was ready to serve queries. This solution was not adopted because the allocation signals intent, preventing congestion.

Once the protocol is relieved of its responsibility of signaling, I believe that many discussions like this one will be more straightforward as the tradeoff space is reduced.

4 Likes

Super interesting. Can’t wait to checkout the demo tomorrow. Best of luck!

1 Like

Hey everyone! Thanks for all the feedback, comments and questions on the Core Dev Call yesterday. As promised, here’s a recording of a demo of the proof of concept, recorded about a month ago - the “frontend”, workflow and logs are more or less the same since then, but we’ve also added a bunch of new features that will be laid out in the README of the proof of concept repo soon. We will record another demo once the proof of concept is finished, where we will dive deeper into how things work.

2 Likes

@Oliver asked a great question about the regulatory risks associated with Graphcast in the Core Dev Call (The Graph | Core Devs Call #15 - September 1st 2022 - YouTube), and also specifically mentioned Price Fixing as a risk.

On regulatory risk:

Each Indexer is subject to the regulations and legal requirements of the jurisdiction in which they operate. Graphcast is merely a messaging technology and will not prevent participants from using it in a way that may expose them to regulatory risk. Doing so would be against the spirit of decentralisation and permissionless participation, so it’s really up to each Indexer to ensure that they are following the laws applicable to them.

On Price Fixing with Graphcast:

Graphcast will not prevent price coordination from happening via gossip. If sufficiently motivated parties want to coordinate on price, there are a million ways to do so, and if it were to happen via gossip, at least that is fully transparent to observers, rather than via “dark channels”.

On Price Fixing more generally:

Pricing should generally resolve to some natural equilibrium, driven by the underlying factors of supply and demand.

Price fixing is most problematic in markets that are not permissionless or otherwise have high costs and/or friction associated with entering the market. Rigid bases of supply are more susceptible to capture by a price fixing cabal.

For example, in markets where there are regulatory barriers to entry (e.g. pharmaceuticals), price fixing is much easier to execute, because new market entrants are much less able to take advantage of the pricing opportunity created by price fixing.

In permissionless markets, new entrants are always able to enter the market and compete with existing suppliers. This makes it much more difficult for price fixing to take hold of the market. Instead, as price fixers raise their prices, they create strong incentives for revenue-motivated indexers to swoop in and serve demand at a lower price, making mid-market price more reflective of the natural market equilibrium.

5 Likes

I think requiring a valid POI upon opening an allocation is still a good idea. I think this would achieve a higher QOS - especially if the idea of a self signed query was implemented indicating the indexer was also properly configured to serve queries.

VERY excited to participate in Graphcast! A few things I am most excited for would be:

  • Error coordination: both on a failed subgraph level and also on indexer level errors.
  • Protocol Testing: Not sure exactly how this would be implemented, but there likely could be a good method of coordinating on errors encounted on tagged versions if it could be automatically broadcast to a radio.
  • Node version coordination: especially with a multi-chain mainnet it’d be great to see how many indexers are running each respective client and version.
  • Node spec coordination: It would be amazing for indexers to be able to share hardware requirements and feature specifications - eg “Network: Mainnet, Storage Requirement: 1.3TB, Supports Traces: True”
  • While some indexers may want to keep some of this data close to their chest, I’d be interested in knowing other indexer’s interest/willingness on collaborating on database level configs (RAM allocated, shared buffer size, etc.)
4 Likes