Require that more recent PoIs are submitted in order to collect indexing rewards

Overview

This is a proposal to require that Indexers submit more recent PoIs in order to collect indexing rewards on a subgraph.

Motivation

The current behavior as specified in the yet-to-be-ratified draft of the Arbitration Charter is that PoIs should be submitted for the first block of the epoch in which an allocation is closed, or the epoch before that. In effect, this means, that PoIs may be submitted that are up to 48 hours behind the chain head.

Last week, the current rules allowed for a situation in which the Synthetix subgraph had no Indexers allocating stake to it that were synced within 5% of the chain head. Effectively, this was an outage from the perspective of the Synthetix subgraph.

In the following Indexer Office Hours it was revealed that there actually were several Indexers fully synced on the Synthetix subgraph, however, they had not bothered allocating stake towards the Synthetix subgraph and thus were not discoverable for sending queries. The stated reason for not allocating stake was that there was already so much stake allocated by Indexers (the ones that were >5% behind the chain head) that there was little reason to allocate stake.

This proposal aims to address this issue by requiring that more recent PoIs are submitted in order to receive indexing rewards.

How it works

  1. Add a poiBlock field to the closeAllocation method of the Staking contract.
  2. Do not allow indexing rewards to be collected for PoIs that are submitted some threshold behind the chainhead.
    • There are two ways this requirement might be implemented.

Variation 1 - Enforce at smart contract layer

The requirement to submit recent PoIs may be implemented at the smart contract level. This would leverage the block.blockhash global function which only makes the most recent 256 block hashes available.

This means that the threshold could not be enforced for durations longer than about an hour, which is likely fine for all subgraphs in The Graph today, but could be unnecessarily strict for subgraphs that are only concerned w/ i.e. historical data.

An additional downside of this variation is that it would also impose new gas costs on closing allocations.

Variation 2 - Enforce at arbitration layer

Another variation is to enforce the requirements on PoI freshness through the Arbitration Charter. For example, collecting indexing rewards by submitting a PoI is outside the target threshold could be made a slashable offense.

This would require adding an additional field to the closeAllocation method in order to revert the transaction if it is not mined in time, to mitigate the risk of Indexers accidentally submitting PoIs that are outside the threshold during periods of high blockchain congestion.

An additional downside of this approach is that it expands the role of the Arbitrator, and generally we are looking to minimize the role of the Arbitrator over time.

Next Steps

  • Get feedback from Indexers and subgraph developers on both of these approaches (cc @Oliver)
  • Evaluate the gas costs of the first variation, using gas optimization tricks (cc @Ariel).
  • Make sure PoI cross-checking in the Indexer Agent would still work even with Indexers submitting PoIs for different blocks (cc @Ford)
  • Evaluate an appropriate default threshold that is both sufficiently large to be achievable for Indexders on most subgraphs but also sufficiently small to be useful for most subgraph and dapp developers.

Future Work

Future iterations on this proposal could be to allow Subgraph Developers or Curators to specify the threshold behind the chain head for which PoIs would be accepted, to allow for historical or analytics-focused subgraphs that may not be as concerned with data freshness.

3 Likes

Hey Brandon.
Thank you for starting this discussion.
As I said on the Indexer Office Hours, I’m not sure that this will solve the main issue with large Subgraphs.
From my point of view, we should calculate Indexing rewards only for Indexers who already fully synced the subgraph. Because only these Indexers may serve queries properly with fresh data.
For broken Subgraphs, like Opyn (that should be a totally rare case if dApp really uses this subgraph), it could be sync till the last correct POI before an issue.
The current rewards system incentivize Indexers to start syncing subgraphs on-chain from the beginning, why it’s bad:

  1. Subgraph could be broken and can’t be synced to the chainhead, formally it’s not a problem for rewards now, but it seems that this kind of subgraphs and their sync doesn’t provide anything to the network. But when it will be detected, Indexers should reallocate to the new one, spend gas, nervous, and time for additional manual actions.
  2. Some subgraphs couldn’t be synced for 28 epochs at all. Old Sushi and Synthetix. So, in this case, Indexers will lose all rewards after 28 epochs until fully synced these subgraphs. Additionally, have a chance to wipe all rewards if someone close their allocation after 28 epochs from the delegator’s side (it happened for some Indexers several times).
  3. Indexers without good knowledge of the Network can jump in with huge allocations, impact proportion for rewards a lot, but as a result of #2 will close allocation with 0 POI. Good Indexers on this subgraph will get fewer rewards, delegators of first Indexers lose all rewards for 28 days also. Other good Indexers who fully synced it off-chain wouldn’t allocate to it because of bad proportion as you described.
  4. Current system does not incentivize Indexers to find out how to sync subgraphs faster to server queries as soon as possible, because 99% will be synced anyway for 28 epochs with default settings. But dApps will not get served queries for a long time.

New system of rewards will not heavily cut rewards for indexers, because they will keep allocating to existing Subgraphs and getting rewards. After they fully sync new subgraph they will calculate better allocations across network subgraphs and everything will be balanced. Indexers who take care about sync speed and query serving will get more than Indexers who just siting without any actions.

Hey @KonstantinRM, I generally agree with your point that requiring a recent PoI to open an allocation would likely improve the data freshness story more than just requiring a recent PoI to close an allocation and collect indexing rewards.

TLDR: I think we could do both and we should take advantage of whatever low hanging fruit we can to improve The Graph’s story here as quickly as possible

I would be interested in hearing some other Indexer opinions on how they would feel about foregoing the rewards that are, at least in theory, accumulated while catching up to the chainhead versus only receiving rewards after caught up to the chainhead.

As far as implementation goes, I imagine this variation would imply a few more changes to the smart contracts, as well as the indexer agent’s allocation management. cc @ariel and @Ford, respectively.

This is a candidate implementation for option 1 of this proposal.

It includes the following changes:

  • Add a poiBlockHash to the closeAllocation() function.
  • Whenever a non-zero POI was presented, closeAllocation() will test that the poiBlockHash matches the canonical chain. If it doesn’t, the function will revert.
  • Emit the poiBlockHash on the AllocationClosed event so that the Arbitration team can verify that POI matches with the block.

On-chain Check

The solution relies on using a for-loop that tests blockHash(n) == poiBlockHash from the latest block number back 255 blocks. That’s the max amount of block hashes we can test from the contract.

256 blocks = about 1 hour based on average block time.

If we loop through block.number - 256 and the hash is not there it means:

  1. The indexer presented a random blockHash
  2. They sent the block hash of a chain that was reorg
  3. The blockhash is over the 256 blocks window (too old)

Gas Usage

I did some estimations on a local hardhat node, and a call to blockHash(n) costs about 161 gas.
If the for-loop for validation needs to verify the full 256 blocks it will increase the gas cost of closeAllocation by 161 * 256 = 41200 gas.

Implementation

2 Likes
  1. Is it possible to add a new parameter to closeMany and closeandallocate functions too?
  2. Can you provide some information about the plan for this upgrade? Currently, we are using some scripts for close allocations and we are using abi from Abarmat gist. Will it be updated too?

I’m not in favor of this proposal as-is because it does not go very far to solve the problem and introduces costs without clear benefit. We should spend our limited time investigating other solutions to the problem rather than implementing this one.

Problem statement:

Proposed Protocol Change: Shorten the window from 48 hours to less than that (maybe 256 blocks).

Analysis: Since 48 hours behind is much less than 5%, I have to conclude that the effect of this change would be to go from “stale Indexers do not collect rewards” to “stale Indexers do not collect rewards” (No change in incentives or outcome).

A more boring explanation of these events is that people are running the indexer-agent software as we provided it, which allocates before indexing near chain head. The rationale for this is that the Indexer may optimize their rewards if they can catch up in time, but it is a not a pro-social default behavior since it may lead to some of these problems. The mundane fix for this would be to change the default behavior of the indexer-agent.

To move from “pro-social defaults” to protocol changes we may instead require a PoI when an allocation is opened. This is a more direct fix than what has been proposed.

The counter argument to allocation open PoIs (that Indexers would “forgo” rewards accumulated while Indexing) is not a strong argument for me. The number of Indexer rewards minted is the same, it’s just a question of how they are being distributed. Given a pro-social equilibrium, each Indexer would receive the same amount of rewards on the time horizon. The current protocol incentivizes behavior which effectively burns rewards though, meaning that by selecting the current protocol Indexers as a group would be receiving less. Therefore Indexers would benefit from this change.

Another problem may be that Indexers which close with 0x0 PoIs “take” rewards from other Indexers. If a stale Indexer had closed with 0x0 repeatedly that may be a signal to an Indexer that their stake can be ignored when choosing what subgraphs to allocate to. But, economically I don’t think that’s the case right now. (Can someone fact-check this?)

Tangent:
There are some side-effects to the decision of coupling allocations and PoIs. Allocations are a signal that queries can be served (and tie stake into query volume) but PoIs provide consensus around the database. These aren’t quite the same thing, since it may be beneficial to serve historical queries or provide a PoI without serving queries (at least in theory). This difference in having allocations serve 2 functions leads to a less than ideal outcome for either function. There are strong arguments for tying them together though it’s just something to keep in mind for this discussion.

2 Likes

Revision: I am for this change in response to a different problem, which is that a time-bandit attack on Ethereum or slowly mined transaction would currently expose the Indexer to slashing risk. This change would have the effect of negating the Indexer’s rewards in this case instead (which may affect transactions down the line in a time-bandit attack but that just comes with the territory).

There are flavors of this proposal that I would be amenable to as well in shortening the window for valid PoIs - but my point is that the shortening the window is not the most urgent or important part of this problem for helping Synthetix or other Subgraphs experiencing similar issues. Being 48 hours behind chain head and collecting rewards is naturally going to be a short-lived condition on the way toward being fully synced so it’s not the core issue. There’s a theoretical argument that an Indexer could try to stay 48 hours behind but I don’t know if there’s much merit to that as a strategy since it would more likely expose them to risk of lost rewards than it would save on infrastructure costs. Once query fees dominate this strategy would disappear anyway.

Never attribute to malice that which is adequately explained by stupidity running our software with the default settings.

After offline discussions w/ both @That3Percent and @chris, I’m adequately convinced that:

  • My original proposal does not go far enough in solving the original problem.
  • Requiring a PoI on opening of the allocation, given how indexing rewards work today, is both desirable and necessary.

The points that convinced me are essentially what @KonstantinRM was advocating up above.

To restate in my own words and also elaborate:

  • Even if recent PoIs are required for closing an allocation, an Indexer doesn’t know a priori if another Indexer, or even themselves, will be eligible to submit a valid PoI at the end of an allocation.
  • Currently, when an Indexer submits a “zero PoI,” indexing rewards are not redistributed to the remaining Indexers, so even if an Indexer ends up not collecting indexing rewards, they may still crowd out other Indexers that would have otherwise been able to serve recent data. (This property of how indexing rewards are computed is not trivial to change, but is worth further investigation)

My initial concerns w/ requiring a recent PoI on opening of an allocation:

  1. Removes compensation to Indexers while they are still performing the useful work fo catching up to the chainhead.
    • Compounding this problem, if Curators rugpull a subgraph, it’s possible an Indexer will fully sync a subgraph while never receiving indexing rewards for that subgraph.
  2. Removes an important signal to Indexers on-chain that there are already Indexers intending to index this subgraph (before doing the work of syncing a subgraph entirely themselves).

@chris convinced me that #1 is a non-issue because many Indexers already fully sync subgraphs before opening allocations, because the opportunity cost of missing out on rewards (if you end up needing to submit a zero PoI for example) and allocation gas costs dominate the cost of syncing a subgraph.

I believe #2 can and should be addressed by more P2P signals between Indexers, which serve a number of other use cases as well.

My remaining concerns/considerations with this proposal:

  • Gas costs. I think the recent PoI requirements both for opening and closing allocations, need to leverage something similar to what Ariel prototyped above (48 hours of data freshness doesn’t really accomplish our goal). According to @ariel this would result in ~40K gas, which represents a roughly 15% increase in gas costs for allocaiton management, which is already one of the dominant costs for Indexers. I think this can be mitigated somewhat by making the data freshness requirement stricter, for example 15 minutes, so we only need to scan ~64 blocks on-chain as opposed to 256 blocks.
  • Multi-blockchain. There has been discussions in the community recently, led by @adamfuller and @That3Percent around how to reconcile notions of time across chain for submitting multiblockhain PoIs. I’d want to make sure that whatever design we come up with here doesn’t become a one-off for Ethereum mainnet subgraphs and has some analogous design that would work for multiblockchain subgraphs.
6 Likes

Gas costs and multi-blockchain can be accomplished the same way - by checking recency with arbitration rather than on-chain.

We won’t be able to validate recency on-chain for multi-blockchain anyway (assuming we land on the subgraph-specified-bridge design). This data is not necessarily available in all bridges, you would need to put a light client for every bridge into our contract if they were available, it is unlikely that APIs for reconciling time across the two chains are coherent/useful for our case, and it would be prohibitively expensive on gas.

For Indexers cross-checking, the performance implications of checking 64 or even 256 blocks is trivial with some minor modifications to the query for PoI. You can do it in a single database load plus N hashes.

The counter to the above is that we should be moving away from Arbitration wherever possible. Once we have Verifiable Indexing we would have to re-consider the above. But, my expectation is that the tradeoff space will be different then. We may have tools like Ethereum 2.0 available to us, or better bridges, or who knows at that point. Gas cost might be better, worse. Since validating this one piece of data on-chain is nontrivial today and it provides only marginal benefit without Verifiable Indexing my view would be to use Arbitration today and re-evaluate when Verifiable Indexing is closer.

Agree, we could enforce data freshness off-chain through Arbitration if all else fails. This has the following drawbacks:

  1. Increases reliance on the Arbitrator, as you mention, which we would like to rely on less. It also makes the job of the Arbitrator far more subjective unless we establish a canonical source of truth for reconciling time across blockchains, which starts to look like a bridging problem again.
  2. Doesn’t go as far in solving the original problem as enforcing data freshness on-chain. Since data frehsness would not be enforced in real-time, it’s possible that an Indexer (who might possibly get slashed later) could still crowd out other Indexers who otherwise would have allocated, although they would be incentivized not to do so.

At risk of making this thread more about multi-blockchain PoIs, I have a proposal for enforcing data freshness for such PoIs, inspired by recent conversations with @ariel and @yondon around L2 scaling architectures, that I believes addresses all of our requirements above (except for gas costs, though I believe this can be mitigated):

The core observation is that we are currently trying to do two sets of things on-chain as part of the closeAllocation transaction that might be decoupled:

  1. Close (Reopen) Allocation. Which includes paying out indexing rewards, settling query fees, and updating the allocated stake amount.
  2. Verify PoI is Submitted for Correct Block. This includes making sure multi-blockchain PoIs are submitted for blockhashes that are part of the canonical foreign chain and meet our data freshness requirements.

The challenge with doing these things all on the same chain (i.e., Eth Mainnet) is that bridges don’t update that frequently (i.e. 3-6 hours for NEAR or 7 days on Optimism/ Arbitrum).

Even using a trusted Oracle doesn’t improve things substantially because the foreign chain’s finality model may not support knowing whether a block is final in real-time, and so cannot act as a reliable source of truth for data freshness to any other chain.

From an architectural standpoint, the main problem is that we are trying to enforce logic on one chain that is a function of logical time on a completely different chain, that has its own completely separate notion of causality. An analogy to this might be writing a microservice that has logic that takes hard dependencies on ephermeral, quickly changing, state of some other microservice, or vice versa. You might be able to make it work, but why not just put the logic where it makes the most sense?

So my proposal is to put computations where they are most natural:

Design

Opening/Closing/Reopening an allocation

On Foreign Chain

  1. Send a submitPOI transaction w/ the following arguments: blockHash, subgraphDeploymentId, poi, recipientChain, indexerId.
    • Params
      • blockHash - This is the blockHash on the foreign chain (the chain on and for which the PoI is being submitted) that the PoI corresponds to.
      • subgraphDeploymentId - The subgraph for which the PoI is being submitted.
      • poi - The Proof of Indexing (PoI) being submitted for the specified subgraph as of the specified block hash, on the foreign chain.
      • recipientChain - This identifies the chain to send the submitted PoI to over a bridge. Right now this would be Eth Mainnet, but in the future could also include L2s or ann app chain, if we partition our allocation management logic across multiple scaling solutions (more discussion on this idea pending on this and other threads).
      • indexerId - An identifier corresponding to the Indexer in The Graph protocol as a whole, or perhaps just specifically on the recipient chain (i.e., if alternate addresses are being used for compression purposes).
  • Logic
    • Computes the dataFreshness from the difference between currentBlock() and blockNumber(blockHash).
    • Sends a SUBMIT_POI message, over a bridge, to the recipient chain, which includes the following data, most of which have been defined above:
      • indexerId
      • dataFreshness
      • blockNumber - The block number corresponding to the blockHash
      • (Optional) poi*
      • (Optional) blockHash*
      • * poi and blockHash could be ommitted, as long as we are using arbitration, because the Arbitrator could reference the poi and blockHash submitted on the foreign chain. However, they might be needed when we introduce on-chain verifiable indexing, which would likely live alongside the rest of the core protocol logic.

On Primary (Recipient) Chain

  1. After enough time has elapsed for message to cross bridge, send an allocate, closeAllocation or closeAndAllocate transaction, that work similar to the functions defined here, except with the following changes.
    • Verifies that the Indexer has submitted a PoI for a recent enough block number on the foreign chain. This threshold could be set per chain, to allow for roll up chains with long finality times.
    • In the case of closing an allocation, verifies that the submitted PoI is for a block number that is some threshold greater than the previously submitted PoI’s block number. This ensures the Indexer actually stayed synced to the chainhead for some minimum period of time.
    • If either of the above two verifications fail, then either the Indexer may not open an allocation or may be forced to close an allocation with a “zero PoI,” foregoing indexing rewards.

Analysis

There are some obvious drawbacks to the above approach:

  1. Allocation management now requires interacting with multiple blockchains, while accounting for asyncronous mesage passing between these chains.
  2. Requires writing a minimal bridge wrapper on each new chain we wish to support.
  3. We still incur gas costs of using the bridge. Probably more gas than if we just used the bridge to reference block numbers, though likely less gas than if we had an Oracle continuously maintaining a feed of block numbers at the frequency we would require.

My responses to the above drawbacks:
While #1 adds operational overhead, I don’t believe it is substantially more than the operational head of an Indexer interacting with a new chain for the purposes of indexing.

While #2 implies more work for and new skills for Graph Protocol core developers, these are skills that our ecosystem needs to build anyways. Not only for implementing L2 scaling, but for example, if we deploy protocol contracts to an L2 scaling solution that does not already have native bridges to the multi-blockchain chains we wish to support, then we may need to be comfortable enough with the design of these other chains to build bridges ourselves. Also, I think compared to the work of integrating a new blockchain into The Graph as a whole, I think the incremental work would be minimal, especially once the pattern of building these bridges is well established.

With respect to #3, it’s possible that this design makes more sense when we move to an L2 (we won’t know until we try and evaluate gas costs), but we should do so sooner rather than later… so I think now is the right time to start evaluating such designs.

Looking forward to hearing feedback on this approach.

2 Likes

Some scattered thoughts…

I’ve been using the terms “Data chain” and “Settlement chain” elsewhere to refer to “Foreign” and “Primary” chains. I’m not married to these terms but I think we should decide and stick to it. I prefer “Data chain” to “Foreign chain” since “Foreign” is relative. “Data chain” (DC), “Protocol chain” (PC), and “Settlement chain” (SC) are my current picks for eg: PoI, Settlement L2, and Ethereum respectively.

I’m not keen that the Indexer has to wait on the bridge to migrate their PoI before closing/opening their allocations. This puts the Indexers rewards at risk when there is high latency incurred by using the bridge. Bridge latencies may be very high (eg: 7 days or more for optimistic rollups, multiple days for added sequencer bugs/downtime, unknown times added for low data throughput/interest). These times also affect the dynamism in the network - 7 days is an unacceptably long time to respond to increased load. I would rather split up closing the allocation and collecting rewards for it as separate transactions. That would fix 1/2 of the latency problem. Previous iterations of the multi-blockchain PoI design used a window over the latest bridge states, trading off enforcing data freshness for continuing to function in the face of error.

Not all bridges support bridging arbitrary data. Previous iterations of the multi-blockchain PoI design did not have this as a requirement. So, this design would exclude those bridges.

I don’t think arecipientChain field is necessary. In fact, this field may be counterproductive. There is no reason the PoI could not serve as an attestation of the correct value to any/all chains it is bridged.

indexerId This would have to be validated with a signed message.

Some amount of leniency should be added to the constraint that “the submitted PoI is for a block number that is some threshold greater than the previously submitted PoI’s block number”. Sometimes progression may even be negative. For example, A submits PoI on DC, B submits PoI on DC, B allocates on PC, A allocates on PC. For small windows this is expected. You could mitigate this by not cross-checking Indexers block numbers, but that leaves other problems in that the mechanism may no longer force recency.

As noted in your comment, this design requires our contracts interact with the actual bridges. That’s possible, but would be costlier to protocol iteration. Previous iterations of the multi-blockchain PoI design relied more heavily on Arbitration of a specific set of guidelines of how to interact with a bridge which allows for more rapid iteration. While this is acknowledged in the above comment, I think the impact is understated.

Not all chains may progress forward at a probabilistic pre-determined speed. I could imagine a chain that progresses faster when under load, for example (holding block size constant and time variable instead of vice-versa). Even with today’s typical patterns, any hiccup in the difference between the projected speed and the actual speed is a security concern. When margins of error are taken into account the mechanism may fail to enforce recency.

I’m not yet confident in any bridge design without at least an integrated light client on the PC and a concept of finality on the DC. I may be more pessimistic than most, since the world seems to keep turning even though there are no production examples of this to date that I’m aware of.

I’m assuming above that the previous version of the multi-blockchain PoI design is the “subgraph specifies a bridge” design, which itself received significant pushback at the The Graph Foundation R&D Retreat.

2 Likes