The Graph Indexer network real time communications

Hi! I’m Henri Pihkala from Streamr – a decentralized real-time data infrastructure project.

The Streamr Network is a peer-to-peer network for publishing and subscribing to data in real-time. The Network consists of nodes that interconnect peer-to-peer using the Streamr protocol to form a topic-based publish-subscribe messaging system. Topics in this messaging system are called streams. The job of the Network is to deliver published streams of messages to all subscribers of that stream.

Late last year we completed three testnets with over 90 thousand interconnected nodes streaming messages to each other. Earlier this year we launched the production ready decentralized Network that maintains over four thousand interconnected Broker nodes.

Streamr Network properties

  • Censorship resistant publish/subscribe messaging at scale
  • Sub-second ordered message delivery
  • Cryptographically signed and end-to-end encrypted messages
  • Composable smart contract access control
  • Runs anywhere JavaScript runs, including the browser
  • Free to use, peer-to-peer architecture
  • Pseudonymous messaging

Streamr is a power user of The Graph (TG). It depends on TG for real-time complex state queries of its on-chain stream registry to determine which Ethereum identity has permission to publish/subscribe on streams of data inside of the Network among many other use cases. We believe that Streamr could also be used by TG internally to interconnect and extend the capabilities of its own Indexer network.

By introducing decentralized real-time communications into TG’s stack, the TG protocol will be better able to maintain itself, and do so without compromising on TG’s open and decentralized nature. On Streamr, every data point is cryptographically signed by the data publisher using their Ethereum private key, so that provenance and tamper-proof guarantees are as strong as Ethereum itself.

Indexer performance and problems would be able to be quickly identified, benchmarked and solved if the Indexer network openly shared real-time stateful logs containing:

  • Observed block height & hashes
  • Proof of Indexing (PoI) hashes & PoI data
  • Subgraph indexing logs, warnings and errors
  • Local machine activity logs
  • And more possibilities

In this hypothetical upgrade, Indexers interconnect and form a network topology, publishing stateful data points to each other as well as external interested subscribers, such as the TG core devs and subgraph developers.

Real time data sharing and interconnection should improve support outcomes as well as protect Indexers from slashing. For example, Indexers could compare indexing hashes among their peers in real-time, in case others saw different data then they could do some recovery processes before they submit their erroneous data as PoI.

Since TG’s Indexer software is written in TS and runs on NodeJS, it would be straightforward to include the streamr-client NPM package into the TG Indexer. Using this library, every TG Indexer becomes a node in relevant Streamr peer-to-peer topologies, and can broadcast messages to other Indexers listening for events in the same stream/topic. The TG Indexer use case can be modeled as a stream per subgraph, a global stream joined by all Indexers, or a combination thereof.

The addition of this messaging functionality adds some CPU, memory, and bandwidth usage, which depend on the volume of messages in the streams, but should in this use case remain negligible.

In short, we are looking for feedback towards conducting a proof of concept where Indexers share their real-time state, with the goal of enabling automatic recovery functionality and other reactions based on the data. We’re interested to hear any technical limitations we might encounter, opportunities to explore and closer collaboration with the TG core devs.

Any links to previous work, discussions in this direction, thoughts, ideas, wish lists, would be highly appreciated.

Thanks!

4 Likes

Hey! I’ve been supporting this proposal and some recent updates have come up today that are helpful to share here.

The GraphOps team have built a Gossip Client PoC that matches with the intentions of this proposal. The biggest difference of course is the choice of data transport protocol - the current PoC uses Waku v2 protocol, whereas in our proposal Streamr would be used instead.

At a quick glance, here are some of the technical benefits of using Streamr for this use case:

  • Streamr protocol shares the cryptography with Ethereum, so all messages are natively attributable to Ethereum wallets, which is great for networks like The Graph where indexers have Ethereum identities
  • In Streamr, all indexers become light nodes in the network, so there’s no reliance on any intermediate nodes uninterested in the data
  • Streamr is provably performant and low latency at scale
  • Streamr implements end-to-end encryption out of the box, if needed, and implements an access control layer based on that.
  • Streamr will have (in Q1 2023) incentive mechanisms to protect streams against disruptions caused by churn, eclipse attacks, and difficult NATs
  • Streamr ecosystem has pretty good discovery tools for sharing data also with external audiences

We’ll attend the next R&D call to answer any questions about our PoC proposal!

5 Likes

Hey @henripihkala @matthew_streamr!

Incredible timing!

Our GRC for Graphcast was just published:
GRC-001: Graphcast - a Gossip Network for Indexers

Great to see that you concur the approach would be valuable for the network! We welcome your thoughts on how we can improve the current design, and we’re definitely open to exploring Streamr as an alternative messaging backend.

4 Likes

@chris Fantastic, me and Matthew will join the call tomorrow to exchange some thoughts around this!

Sadly I’ll be flying during the call tomorrow, but @AxiomaticAardvark will be there. I’ll send you a DM now to get connected and schedule some time for a deeper chat.

1 Like