GIP-0035: GraphQL `@live` queries

GIP-0035: GraphQL Live Queries


GIP: XXXX

Title: GraphQL Live Queries

Authors: Dotan Simha dotan@the-guild.dev

Created: 2022-22-06

Stage: Draft

Discussions-To: https://forum.thegraph.com


Abstract

Today, The Graph API provides a rich interface for querying data indexed from the blockchain. The interface today consists of type Query that provides an entry point for the graph of data.

To provide a more easy querying protocol for real-time data, we are looking into the option of implementing @live queries as part of the GraphQL interface of The Graph.

This GIP describes The Graph’s real-time querying policy and a new suite of tools to help dApp developers query real-time data from their Subgraphs.

High-Level Description

This GIP describes the following:

  • The background for choosing GraphQL @live queries over GraphQL Subscriptions.
  • The network transport that will be used for shipping the responses
  • The execution flow and integration with the network gateway
  • Compatibility layer for graph-node
  • Query costs calculations

Subscriptions vs @live queries

At the beginning of the discussion around real-time GraphQL, we have the differences between GraphQL @live queries and GraphQL Subscriptions.

The main difference is around the approach:

While GraphQL @live queries is an extension of a regular GraphQL query { ... }, with just an added directive on specific fields, the GraphQL Subscription approach is based on an event.

Here’s an example for creating a @live query, that asks to watch a specific field out of the entire query:

# Regular GraphQL query annotated with @live directive
# In this example, the consumer can control what are the relevant changes that needs to be watched in real-time.
query tokens {
  pairs {
    token1 {
      id
      name
      decimals @live
    }
    token2 {
      id
      name
      decimals @live
    }
  }
}

With Subscriptions approach, the consumer is limited to the event that caused the change, and it requires managing complex filtering before getting to the actual data changes:

# GraphQL subscription watching specific events, then based on the event we can reach the data we need.
subscription onPairTokenChange {
  onPairTokenChange(pair: "...") {
    pair {
      token1 {
        id
        name
        decimals
      }
      token2 {
        id
        name
        decimals
      }
    }
  }
}

With this approach, the Subgraph definition needs to be in charge of exposing the relevant events, and the GraphQL schema is in charge of filtering and state management for each connection.

In addition to that, GraphQL Subscriptions are usually an extension for a GraphQL query: you perform an initial query, and only then subscribe to more data or specific data changes.


The initial thinking process was trying to figure out what dApps need: do they need to know that the data has changed, or do they need to know why the data has changed?

Since Subgraph developers can already manage the lifecycle of why the data has changed (either using Substreams, or just dataSources definition), we needed a way to allow Subgraph consumers to know that some data has changed, and make it easy and accessible to receive real-time data.

Subgraphs developer and dApps developer can often be different entities: you can only be a consumer of a Subgraph, without the ability to manage the events, schema, or definition.

With GraphQL subscriptions, we might have a limited event catalog for the consumer. With @live queries, the power of choosing the real-time views is in the hands of the consumer.

SSE: The user-facing transport

For the transport layer, we had 2 options: WebSocket or SSE (Server-Sent Events). We decided to go with SSE for the real-time transport because of the following reasons:

  • WebSocket opens a two-way channel, while SSE only does a one-way channel (Server → Client). We only need the ability to push data from the server to the client.
  • WebSocket is using its protocol and requires a secondary TCP connection, while SSE can leverage the existing HTTP layer.
  • WebSocket is not (yet) an effective solution - it’s heavy and requires a special setup for mobile apps. SSE acts as a regular HTTP request with multiple responses.
  • WebSocket requires a complex setup for consumers, while SSE can simply use fetch protocol.

With SSE setup, we can also easily track the health of the connection (since it acts as a regular, long-living HTTP request) and ensure the delivery of responses.

@live queries execution

To implement @live queries in an easy way for consumers, we are going to integrate and extend the integration of the Gateway with indexers and graph-node instances.

  • The Graph Gateway will receive a GraphQL request and will detect if it’s a @live query (by traversing the GraphQL operation AST)
    • If the request does not include any @live directive → a normal execution is performed.
    • If the request includes one or more @live directives defined on fields, then:
      • If the request does not include Accept: text/event-stream header, an error will be thrown.
      • The Graph Gateway will respond with the following HTTP attributes:
        1. Data description headers: Cache-Control: no-cache, Content-Encoding: none, Content-Type: text/event-stream and Connection: keep-alive.
        2. A periodic message with the payload: event: ping will be sent to keep the SSE connection alive
        3. The Graph Gateway will create a clean GraphQL operation (without @live the directives), and execute it periodically (pooling) to graph-node instances:
        4. The Graph Gateway will store the hash of the root GraphQL objects - and will use that to match every new polling response with the stored hashes,
          1. In case of changes to hashes, the new version is stored, and an event is emitted to the stream with the payload: data: {"data": { }, "errors": [] }
        5. In case of a socket disconnect event, the Graph Gateway will stop polling and drop the in-memory hash for the given queries.

Technical overview of the execution flow

Example of an input query that will trigger @live workflow in The Graph Gateway:

query tokens {
  pairs {
    token1 {
      id
      name
      decimals @live
    }
    token2 {
      id
      name
      decimals @live
    }
  }
}

This will trigger the following query periodically to graph-node instances that are part of The Graph Network:

query tokens {
  pairs {
    token1 {
      id
      name
      decimals
    }
    token2 {
      id
      name
      decimals
    }
  }
}

A hash will be calculated based on every selection-set coordinate that has the @live directive specified: pairs.token1.decimals and pairs.token2.decimals and will be stored.

A periodical query will be executed to fetch the latest data from graph-node instances, and when pairs.token1.decimals or pairs.token2.decimals changes, we’ll publish a new event to the SSE transport stream.

Side note: we can consider to drop all non-@live fields from the periodical queries, and run only the fields that we actually want to watch.

Compatibility layer in graph-node

To allow an easy and consistent development process, and also to allow users to run their instances of graph-node in standalone mode, we’ll need to implement a backward compatibility layer in graph-node code, to run the same process (@live detection, stream execution based on polling, and an SSE transport) and reply with a streamed response based on SSE protocol.

Integration with Graph Client

The Graph Client acts as a network and GraphQL execution layer, between the consumer’s code (or the consumer’s GraphQL client).

To comply with the SSE-based transport, and with the possibility of getting a streamed response (instead of a single response), the Graph Client network layer will need to perform the following (as part of the GraphQL-Mesh execution layer):

  1. If the outgoing GraphQL operation includes a @live query, a stream request needs to be constructed:
    1. A header with Accept: text/event-stream will be added to the outgoing request.
    2. An AsyncInterable will be returned from the GraphQL execution layer, where every iterator object will consist of a valid GraphQL response.
    3. If a GraphQL client is configured for the consumer, an adaptation layer will need to handle the GraphQL normalized cache updated (by emitting multiple responses).

Changes to graph-node

Besides the changes described above for the compatibility layer, we’ll need the following changes implemented in graph-node:

  1. Add the SDL definition for the @live directive.
  2. Consider dropping the broken GraphQL Subscription implementation and removing it from the schema (note: this is a breaking change).
  3. Improve the HTTP layer to support streamed responses based on the SSE protocol.

Query costs

To address the changes in the query execution flow with the new stream-based responses, The Graph Gateway will have to check for deterministic responses for every event emitted on the stream). This will also lead to a change in the way the Gateway calculates the query fees.

Still need to figure out this part of the GIP.

8 Likes

Would this mean that any user with an application open would separately receive the query or would it effectively be 1 push to all open clients? It seems from a query cost portion we would charge per client that the data was pushed to.

Has there been any discussion as to how a subgraph with multiple indexers would serve this (e.g. round robin, first come first serve etc.)?

hey @dotansimha thanks this is great! The @live pattern does seem like a good fit. I was wondering about the proposed implementation. Would it be possible to rely wholly on a graph-node implementation of the @live queries, with the Gateway acting more like a passthrough between the consumer and an indexer, rather than duplicating the functionality?
There are some edge cases to consider (e.g. if the indexer drops the connection), but I wonder if they could also be streamlined by some changes to graph-client.

A concern was raised at the R&D Retreat that live queries may not be a good fit for our protocol because live queries imply a long-lived relationship between a consumer and an indexer.

I’m surfacing this concern in case it resonates with someone who would like to defend that stance. This concern doesn’t resonate with me, though, because the typical use-case would be for UI’s to auto-refresh for a few minutes. The “long-lived” nature of the request is opt-in, presumably cancelable, and any consumer uncomfortable with that could use polling. If connections lived for months, maybe this would be a concern for me, but the default behavior of the library could be to reconnect every hour or something. Assuming Ethereum block times, 1 hour would be a commitment equal to about ~250 polls.

Interestingly, the proposed architecture of having a Gateway turn a subscription into a series of polls alleviates this concern in the short term since the gateway would select an indexer with each poll.

However, the question remains because of the implications of introducing the API if we transition to a world without the gateway role. My thinking on this has shifted in the last couple of years. Initially, the gateway role was designed to be deprecated once we could support things like payments, onboarding, and light clients in the browser. But now, I think gateways offer other valuable services that are harder to replicate. These include service discovery, cross-checking at scale, a holistic view of the world for indexer selection, etc. These require a lot of data to function well, and the gateway amortizes the costs effectively across many consumers.

1 Like

I think it would be more expedient to start by implementing as proposed in the GIP. Assuming that the graph-node implementation uses a library for diffing GraphQL that the Gateway can re-use, this should be relatively straightforward.

Unfortunately, a pass-through connection may not be straightforward. We would need to figure out

  • Payments: Can we stream payments on a one-way connection?
  • Fisherman cross-checking: Do we need to reconstruct the response without deltas to cross-check, or does a live query assume a new requestCID for the attestation that assumes the previous response somehow?
  • Indexer selection: Are there additional criteria for selecting indexers with subscriptions? Do we monitor the service and kill a subscription at some point to route it to a new indexer? Under what conditions?

These may all be possible, but it feels like the bigger lift compared to a polling loop in the gateway.

I think given those concerns, if all the Gateway is to do is identify the “live” parts of the query, and then poll indexers and checking the returned data vs. what it has seen before, my preference would be to implement live queries purely as part of the Graph Client, rather than in the Gateway. This would mean more network requests in aggregate, but removes the need for anything stateful.

I am pretty strongly against adding more functionality to the Gateway except where absolutely necessary, and I am not sure what polling in the Gateway does to improve the experience here, apart from maybe reducing latency (saving a hop from client to gateway).

I also feel like the Fisherman’s ability to cross-check and validate things is harder (and not discussed on the GIP), if there is this additional logic on the Gateway, between the client and the indexer.

Interested in your thoughts on that idea @dotansimha

2 Likes

We start experimenting with adding @live directly in graph-client, and just handle polling there. This seems to work pretty well, and the consumer can decide on the polling rate and what to watch.
I think shipping now with graph-client might be simpler and create a lot of value with lower complexity or major changes in graph-node/gateway.

5 Likes