N-1 to N Solution: Staging Version

Problem Statement

As developers publish a new version (N) to their subgraph, there is a period of time which causes instability while their front end is querying the pre-existing version (N-1). Indexers close allocations on N-1 (which is active in the FE prod environment) and open allocations on version N. For larger subgraphs it may take a while to sync to the chainhead.

Abstract Solution

When developers intend to upgrade their subgraph, allow them to ‘stage’ the deployment. This would be an on-chain transaction which includes the new deployment hash. Indexers could respond with an onchain ‘preparing’ message - indicating they are syncing the subgraph. This would open up the ability to run a status query to see where the indexer is at in the sync process.

Then at the subgraph developers discretion they can await for indexers to make it to the chainhead where they publish the staged version. At this point the GNS will point to deployment N.

Bonus Points Concept

When the developer publishes a staged version, it would be incredible if we enabled functionality to close the allocation of ‘preparing’ indexers and reopen on version N. This would mean a ‘processing’ message would need to include a POI as of the time where it was published and may have some concerns for dispute processes.

The benefit would be that the developer can control the timing and have immediate support as they’re ready to utilize the new version.

Curation Notes

Signal cannot be added to a staged deployment thus we avoid front/back running solutions.

8 Likes

Cool idea! This reminds me of Blue-Green deployments, which is a common technique of ensuring no downtime for production.

2 Likes

To elaborate on the problem statement a bit, the biggest issue with migrations is being unable to anticipate the N-1 version becoming unavailable.

A UI can target the N-1 version even once N is published by using a url like https://gateway.thegraph.com/api/${API_KEY}/deployments/id/${IPFS_HASH}, but this requires manually changing the URL to the N version after N becomes indexed, and before N-1 becomes unavailable. Manually updating this URL feels appropriate for a UI, since there may also be staged changes required for compatibility with a new subgraph schema. (As an aside, that URL pattern doesn’t appear in the Graph docs and I’m always unsure if it’ll continue to be supported)

Any solution that allows for predicting when indexers will leave N-1 would be a huge help. This deploy staging flow would make it easier to use the standard https://gateway.thegraph.com/api/${API_KEY}/deployments/id/${SUBGRAPH_ID} URL, but I’m not sure it’s worth adding another on-chain tx to the publish flow. I’m happy to keep using the URL for a specific deployment.

1 Like

Thanks for the feedback Peri!

I’m curious, if you were able to control the timing of the upgrade from version N-1 to N once you knew indexers were ready to serve, would you be more inclined to use the GNS endpoint https://gateway.thegraph.com/api/${API_KEY}/subgraphs/id/${IPFS_HASH} rather than deployment endpoints?

1 Like

I’d def be more open to that. Though I actually like the specificity of the deployments/ endpoint. Feels more intuitive to use that than to try timing a tx on the Graph side with deploying a new UI version

1 Like

I see both sides of this.

We provide both an Indexer service for subgraphs, as well as deploying subgraphs to the gateway which we would like other indexers to index.

As an indexer I would like to be notified when any of the gateway deployments have a new subgraph deployment added. The notification should include details about whether this is a test version or something which is going to supercede the current deployment, plus an ETA on when the new deployment is going to be published.

There has been a number of times that subgraphs deployed ot the gateway do not notifiy the indexers that an update has occured, so we end up indexing something that isn’t being used and miss the new deployment.

As a subgraph developer/deployer, I’d like to be able to stage a new deployment and flag that it is due to be published on X date in the future (which should default to 5 days PLUS the time it takes to index the subgraph). I would like this to notify any of the indexers that are current indexing the current deployed subraph for that gateway endpoint (provided they signed up for updates).

1 Like

For juicebox, we deploy loads of WIP subgraph versions for testing from EOA-owned subgraph studios. Once a version is ready, we’ll deploy that same subgraph (same Qm… ID) from our multisig-owned studio, and it’s immediately ready to publish—having already been indexed by the graph

It’s rare that we have a subgraph version ready without publishing it, but when that does happen it would be cool to be able to notify “we’re gonna publish this in a few days” so the network could start indexing it, without having to worry about our existing published version going offline. Would even be willing to pay a tax in the case of deciding not to publish a “staged” version, distributed to indexers who have already picked it up

But that said, this doesn’t feel super important. We may end up saving a bit of time to publish, but maybe not worth it to add another tx to the flow. It could take us 1-2 days to get a tx signed via multisig, meaning we might not even execute the staging tx until we’re ready to publish anyway

1 Like

Hey DataNexus, thank you for putting some thought into the subgraph upgrade process; it definitely needs to be improved.

I have a few questions about the proposal. Is the purpose of the preparation stage to signal intent by the app developer but with no economic consequence? From what I see, they will not be reallocating the signal.

I think we should reserve on-chain interactions to manage value flows that encode “hard” incentives (and penalties), keeping the minimum possible game to reduce the cost of operations. We should get anything that is not part of the economic game but still helps with “soft” coordination to the GraphCast network proposed by GraphOps.

GraphCast will always be faster and more flexible. By participating in the p2p network, every indexer benefits from the information exchange. I’m curious if the idea was to allow other actors to participate, I would like to get @chris opinion about this.

2 Likes

Thanks for responding, Ariel. Let me explain further:

The intent would be to have a message that informs indexers when a new upgrade is coming, giving them a chance to start syncing it before it has economic impact.

Since currently >99% of signal is auto migrate, the incentivization drops off of N-1 and goes purely to N upon upgrade. This leads indexers to allocating to a subgraph while they are still syncing it and N-1 support dwindling as indexers realize the upgrade has occurred. While N-1 is often times still used in production (since N is syncing) developers risk having data instability.

I think GraphCast would be a great channel for this message to occur! Is this something that can be integrated into the studio for developers to access? Ideally there would be some way to access the status query for responding indexers so the developer could see where indexers are at in the sync process.

1 Like

Exactly, because a subgraph developer communicating the intent to upgrade does not encode economic penalties in your proposal, I think GraphCast could be a good channel to broadcast this information and propagate across the network. It is still early days of GraphCast but by monitoring that channel different actors could react faster to changing conditions in the network.

More detail about GraphCast: GRC-001: Graphcast - a Gossip Network for Indexers

2 Likes

I think this is a great proposal, thank you @DataNexus I suggested basically this same idea internally for this problem. I personally prefer the option of scheduling an upgrade in a way that’s enforced on-chain if there’s a reasonable way to do it. When you publish a new version, maybe you could pass in an argument to decide the epoch that it would exit staging, and the signal would auto-migrate on that epoch. If something like that were doable you wouldn’t have an additional tx. @ari thoughts on feasibility of what I described?

1 Like

100% agreed, thanks for flagging me here! The long-term intention was definitely that actors outside of Indexers could participate in Graphcast. This sounds like an excellent first use case to make that step.

I do think this is something that can be integrated into the Studio. It would require that a user signs a message (indicating subgraph identifier and, new deployment hash and other metadata) with their publisher wallet which is then gossiped to Indexers. Each Indexer would then verify the message signature is valid for the publisher, and that they care about the upgrade (i.e. likely that they are already indexing the current deployment hash), and could then automate off-chain indexing of the new deployment.

I could even see Indexers gossiping indexing progress back to publishers (slightly more challenging because state is involved), so publishers could see how many of already allocated indexers have fully synced the new deployment before they switching GNS and the rewards over.

@DataNexus if you were open to it and have the capacity to develop this, we’d love to work closely with you to deliver a Graphcast Radio implementation. We’re currently rewriting the SDK in Rust after some very promising testing, and expect to have that ready for use by end of January.

3 Likes

I’d love to be a guinea pig for this! I’ll message you on discord to set up a chat.

2 Likes