A process for specifying the subgraph API version and feature support matrix

As mentioned at The Graph Protocol Townhall last week, I’ve been working on a GIP for a process to more clearly specify what the official behavior of the subgraph API should be in the protocol, and how the respective features of the subgraph API relate to supported functionality in the protocol.

This is especially salient as, during the recent subgraph migration, some subgraphs have been deployed that include features only partially supported in the protocol.

The GIP can be found on the branch zerim/subgraph-api-versioning in the GIPs repo. For convenience, the Abstract and Motivation of the GIP are pasted below.

There are still a couple of sections that are WIP, but any feedback at this stage is welcome!


Abstract

This proposal defines a process for defining the canonical behavior of the subgraph API in the protocol as well as establishing the matrix of subgraph API features and their corresponding supported protocol features.

Motivation

A core value proposition of The Graph, as a decentralized protocol for indexing and querying public data is that a Consumer can trust the integrity of the work performed by the network, with minimal or zero trust in any individual Indexer. There are a number of techniques for achieving this with varying degrees of trust minimization. These include off-chain reputation systems as well as mechanisms that may be combined with slashing such as arbitration, refereed games and cryptographic fraud or validity proofs.

In general, the more trustless the mechanism, the greater the research and engineering effort required to implement it. This proposal therefore describes a support matrix comprising which subgraph API features can be used in conjunction with which features of the protocol, as determined by the strength of the techniques available for guaranteeing the integrity of said features. Having this granular support matrix allows new subgraph API features to be continuously and immediately added to the protocol with additional protocol features supported for those features in later stages–all while driving query volume for these features to the decentralized network, as opposed to any centralized services that might be used to expose these features.

Additionally, all the mentioned techniques with the notable exception of reputation systems, require that the work that Indexer perform be defined deterministically. Therefore, this proposal also describes how a canonical version of the subgraph API may be established via decentralized governance. This is a hard requirement for supporting protocol features such as disputing and slashing Indexers for invalid indexing or query work.

6 Likes

Motivating Use Case
The issue of query versioning has recently become more urgent to properly support the block_gte feature and fix the “block wobble” problem. So, we will need to spend some time on this GIP and implement any changes it requires.

For context, the availability of the block_gte parameter would change the result of existing attested queries. The query { __schema { types { name … } … } } before would not include the block_gte parameter in the result but now should. In addition, previous versions graph-node should attest to an error response for any query utilizing the block_gte feature because the query was invalid since it did not conform to the schema. The same request against a new version of graph-node should instead respond with data but mark the response non-attestable because the query is, by its nature, non-deterministic. To support this and provide N+1 support, we need query versioning.

This use-case raises some issues with the GIP as written, which will need to be addressed.

Issue: Queries cannot be versioned with feature detection alone
As shown above, feature detection is insufficient for versioning queries because a new query feature added to graph-node will necessarily change some responses if we remain GraphQL compliant.

It is standard to add versioning information outside of the GraphQL API when breaking changes are unavoidable.

Issue: Allocations span multiple protocol versions

The protocol version is baked into the EIP-712 domain hash used for signing responses. The contracts are hardcoded to specify that version as “0”. If the contracts are upgraded to change the protocol version, it will occur on a knife-edge timing under the current implementation.

Since allocations, epochs, and dispute periods are long-lived, this presents a problem for attestations because the contracts need to deal with attestations signed before the upgrade.

The GIP specifies that an epoch number is used to signal the end of life for a previous graph-node version. This rule is hard to enforce because it would not be an attributable fault for an Indexer to return old query responses in a new epoch since the allocation signer does not have sufficiently fine-grained timing to identify an epoch. The only way for Indexers to be compatible with this GIP would be to close and reopen all allocations whenever a new version is released.

Issue: Knife-edge protocol upgrade responsibility has been moved to the client
The client needs to be able to validate an attestation. To do so securely, they need to know the message hash apriori. To know the message hash, they also need to know the protocol version.

Idea: Protocol version in url
The client could specify the desired protocol version for a query in the url.

This idea solves all of the above issues. But, it has a drawback: we need a mechanism to signal what protocol versions the Indexer supports. One way to signal supported versions is the status API.

Idea: Protocol version in allocation
Just putting this here for posterity. The idea was that the allocation could specify a protocol version that would serve as a way to signal to the client what would be supported and identify what protocol version a query should use for attestation.

This idea doesn’t work because the contracts need to know the protocol version to discover the allocation by means of recovering the signer. Therefore we can’t look up the allocation to determine the protocol version in the contracts.

2 Likes

As @That3Percent explains, the Dispute contract uses the signed query receipt to recover the Allocation and then the Indexer that is being disputed.

Signed(QueryReceipt) -> Allocation -> Indexer

The reason this works is because the QueryReceipt is signed with the private key that correspond to the Allocation.

The Attestation uses EIP-712 with a DOMAIN_VERSION_HASH set to zero. To accommodate the multiple versions in query responses simultaneously we could add the VERSION to the _attestationData in createQueryDispute() and then parse and recover the signer accordingly.

1 Like

Thanks for raising @That3Percent, and for the context @ariel. This is a good test case of what is a general challenge, particularly as we add functionality to the generated subgraph graphQL API.

I am hesitant to put more burden on the client to request a given protocol version, and indeed to be changing the protocol version for every new field added to the graphQL schema.

To re-summarise the challenge, adding new fields to the graphQL schema means that two different versions of graph-node (valid per the protocol’s supported version range) could give different responses to the same query (in the case of a number_gte query, a filtered view, or an error). ← Let me know if I have misunderstood the challenge.

I wonder if the graph-node version that is being run could be included in an extensions field in every query response (described here and here, though I couldn’t find it in the graphQL spec).

This would then mean that there would be precisely two deterministic responses to a given query (before and after the feature is introduced), which could easily be verified and tested based on the version in the extensions field.

Meanwhile the indexer selection algorithm could (does?) have a preference for non-error responses (in this case more up-to-date graph-node versions), and maybe even a loose preference for higher graph-node versions in any case.

There may be network or attestation-related nuances that make this suggestion unsuitable, so I welcome corrections!

1 Like

Yes, you understand the issue clearly @adamfuller.

I appreciate your motivation to keep the client API simple by not requiring the client to specify a protocol version. The alternative given (the Indexer determines the version) has drawbacks that may be hard to overcome.

Only the client knows what the expected behavior is. Even the latest behavior (if the Indexer happens to know it!) is not necessarily what the client expects. The Gateway cannot even prefer non-error behaviors because the error is attestable and therefore a valid response. (What the Gateway actually prefers is attestable responses, not non-error responses - so in the case of number_gte the Gateway would prefer the error because only the error is attestable). Sometimes we may even transition a previously valid response to an error. (Eg: I would like to deprecate skip).

It’s also a security hazard to let the Indexer determine the correct behavior. If the client cannot verify correctness it leaves the Indexer to do whatever they want. For example, if the Indexer returns something unexpected but they return a future (or even made up) protocol version what does the client then do to verify the response? They may not even be able to cross-check if other Indexers are using a non-made up protocol version.

There is a long-standing, battle-hardened tradition in the web to specifying API version information in the request to avoid breaking clients. I had suggested using a header, but I think this is usually done in the url. I’ll update the original post to reflect that. Using the url is lower friction than a header.

1 Like

Thanks for starting this thread @That3Percent. I agree that there should be a way for Indexers to run signal versions of Indexer software and the protocol version they are running:

  • Protocol Version
  • Graph Node Version
  • Indexer Service Version (this is also likely to impact what valid requests/responses between a Consumer and Indexer looks like)

Also agree that the Consumer should be able to specify a version to be used in a query, so that the Indexer can respond w/ the proper domain separator.

As for allocations, I don’t necessarily think we need to associate an allocation with a protocol version on-chain, but we can establish a convention that whatever protocol version(s) were valid when the allocation was opened may be used for providng a PoI to close the allocation and also serving queries. This makes the set of domain hashes that could be used to recover signatures from an EIP-712 signed message finite.

Note that this GIP already has a provision for N-1 support windows so not every protocol upgrade will necessarily be on a knife edge, which is also what allows for ambiguity on what protocol version a particular allocation corresponds to:

(Optional) Graph Council defines N-1 support window
When upgrading the canonical Graph Node version via a Graph Governance Proposal, The Graph Council may specify a support window during which time the previous version of Graph Node may also be considered valid.

This should be specified as an epoch number in which the previous version will no longer be considered valid. This epoch number must be equal to or greater than the epoch number in which the new canonical version of Graph Node becomes active.


One point I would like to clarify and elaborate on:

While this true, the example being offered here (adding the block_gte field) is not one that I would traditionally think of as a “breaking change” according to GraphQL conventions. This is because all previously valid queries would be continue to be valid queries and produce the same response after the upgrade. The only queries that are broken are previously invalid ones that would now produce a valid response rather than an error. While in some domains it is common for applications to take dependencies on error behaviors of a library or API, this is not common for this type of error w/ GraphQL where an error such as providing a non-existant field in a query means that the entire query fails to return data and therefore is not particularly useful for integrating into an application.

For me, this begs two questions/decisions:

  • Do we really want to treat a change that is purely additive (not breaking change in the sense I define above) like block_gte as a breaking change that requires the Graph Council to specify a new official version of Graph Node on-chain and bump the protocol-wide version?
    • Or do we want to allow Graph Node to introdcue non-breaking changes between major versions (and protocol versions) without the overhead of requiring a Council vote and requiring all Indexers to upgrade lock step each time.
  • Should errors for unsupported fields in a query really be attested?
    • Or should this just be a sign to the Consumer that they must select an Indexder running a newer version of Graph Node, and also have Graph Node version served at the Indexer status API as I propose above?

For both of the above questions I’m inclined to go with the latter approach, to allow the Graph Node core developers to iterate more rapidly w/o the overhead of going through The Graph Council every time.

A final consideration I would add to this is that it used to be the case that an attested response was required to unlock query payments in the state channels, so-called conditional micropayments. In later designs, this is no longer the case and instead we rely on the indexer selection algorithm (ISA) and local reputation scores as the primary incentive for Indexers to produce attested responses, when appropriate.

So the only real value in having an “unsupported field” error be attested to is if we think some harm to the Consumer may be done by tricking them into thinking the error being attested to is invalid. In this context, the Consumer can trivially confirm/reject the validity of the error themselves, without any special knowledge of the results of indexing the subgraph, simply by knowing what query API is supported for a given Graph Node version. If a Consumer sees an Indexer erroneously producing an unattested “field not supported” error, the Consumer may lower their local reputation for that Indexer–exactly the same as if the Indexer had not responsed at all or had produced a seemingly valid response but with no attestation.

This is false. The example above shows that the request for the schema was a valid, attestable query both before and after the upgrade but produces a different response (one response with the new feature listed in the schema and one response without). Even if we did not have a case today which was “more breaking” than the provided example, it would be a desirable feature of our versioning capabilities to be able to make such breaking changes as needed.

I would also like for us to have the mindset that for pure functions errors are not “special”. Non-special errors simplifies thinking and leads to better outcomes. It is desirable to have attested errors. There’s been a lot of effort and discussion around this and it ties in to many conversations from security to ISA / QoS. Removing attestations from errors is a non-starter.

The complication arises not because of GraphQL but because of the sensitivity of attestations to minor changes in the output due to the use of hash functions. (Verifiable queries might not improve the situation). So it may be useful to step back and ask not what is a breaking change for GraphQL but what is a breaking change for attestations. Then the answer becomes “almost everything”. This would be true regardless of whether or not GraphQL was used.


To add clarity and to emphasize that this is a breaking change for attestations only, I think we should rename “protocol version” to “query attestation version” both conversationally and in the contracts. If you think of this as the “query attestation version” and not the “protocol version” you will come up with different answers to some of your questions and concerns. The point I was trying to make earlier by saying that attestations “go away” for actual protocol version upgrades (eg: a feature moving from attestations to verifiable queries) is that this EIP-712 domain hash is only used for attestations and never used for verifiable queries or anything else in the protocol. So the scope of affected areas for this version bump is much smaller than the name implies.

A minor note: the PoI is not signed and the query attestation version is not relevant to PoI.


A council vote is required by the GIP regardless of the concern around attestations. A feature moving from “Experimental” to “Query Dispute Arbitration” requires a vote - and that is what block: { number_gte } is.

In the proposal that the client specifies the version with the query, upgrading in lock-step is not necessary for either the client or the Indexer. The Indexer can answer queries for ranges of protocol versions with backward compatible code all the way to version 0 and there is no fundamental reason to deprecate N-1 versions (possibly ever). That’s a major advantage of client specified versions.

To clarify the implications of this approach as I understand them - Graph Node would need to introduce graphQL API versioning (currently there is only the latest), and the ability to answer queries for historical versions as well as the latest. That versioning could be indicated in the URL:

https://gateway.thegraph.com/api/[api-version]/[api-key]/subgraphs/id/[identifier]

Any feature that changed the graphQL API would require a version bump (even if it is simple addition of a field).

This pattern (for prior version compatibility) exists already in Graph Node for the mapping apiVersion. It would add complexity in Graph Node’s graphQL implementation, and may have some operational or performance implications (I have not fully investigated this). And this would not be about conforming with graphQL compatibility (which allows for simple additive changes), but protocol compatibility.

Let me know if you have a simpler implementation in mind @Brandon and @That3Percent . This does not seem too onerous, and would seem to simplify the situation by creating entirely deterministic query responses. But would be great to clarify if this is a requirement ASAP, given in-progress and planned changes to the graphQL API.

1 Like

@adamfuller Your summary is accurate.

I might add that it could make sense to use semver for the API version. This gives us a way to make what would be considered to be breaking changes to the actual GraphQL API instead of just breaking the attestations with simple additions. This is not strictly necessary, but it signals to the frontend developer how much they need to pay attention to upgrading.

I think for compatibility reasons we may also use url parameters instead of adding a parameter in the middle. It’s possible to support either way, but putting optional (for backward compatibility) parameters in the middle can set a precedent which would make parsing harder.

https://gateway.thegraph.com/api/[api-key]/subgraphs/id/[identifier]?api-version=[semver]

1 Like

@adamfuller Has opened a corresponding GitHub issue for query versioning here: Add graphQL versioning to graph-node · Issue #3024 · graphprotocol/graph-node · GitHub

There is now a GIP for this feature here: GIP-0024: Query Versioning - #9 by That3Percent