I think there’s a lot more that can be done to improve Indexer’s confidence here interacting w/ the dispute + slashing mechanism:
Expose an optional endpoint in Indexer Service to request Proofs of Indexing (PoIs). Since PoIs are hashed w/ an Indexer’s public key, Indexers could cross-check against one another to check for inconsistencies without actually providing other Indexers w/ any data that could be used to submit a valid PoI on-chain and collect indexing rewards.
Update the Indexer Agent to cross-check PoIs before submitting a PoI rather than the current behavior which is to check for inconsistent PoIs after PoIs have been submitted.
Build integration testing or fuzzing tools to help Indexers spot determinism bugs across versions of Graph Node, in their dev ops configuration or even across two different subgraphs that are intended to be functionally equivalent.
All of the above is being tracked internally at E&N and also I believe @eva is tracking #3 at The Graph Foundation and looking for grant submissions in this area to help Indexers build more confidence in the PoIs they are submitting.
My inclination is to get as far as we can on the tooling side and get a good baseline of behavior before tweaking economics, but some changes I am in favor of:
Only make allocated stake slashable as opposed to an Indexer’s entire stake.
Make shorter allocations slashable for a smaller % than longer allocations that collected more indexing rewards (right now they are slashed at the same amount).
Encode some of the rules described in the Arbitration Charter as smart contract logic.
There could be a reason for tweaking the Fisherman incentives, and @Oliver presents one such solution above that could make sense. At the moment, the problem feels too theoretical for me to comment on whether this might be necessary. The disputes that I’m aware of thus far have been very well researched by the Fisherman and haven’t looked anything like the spray and pray approach that you seem to imply might be incentivized. I think once we give Indexers the tools to have greater confidence in their PoIs, I think we’re even less likely to see Fisherman put thousands of GRT at stake on a dispute simply as a gamble.
I will add some of the ideas discussed above to a “Future Work” section of the Arbitration Charter, though I don’t think these should block the charter itself, as the charter only moves us in the right direction by giving Indexers far more protections than they currently have.
In general I am in support of the charter as it stands today, but do share some of the concerns @KonstantinRM has kindly shared around the incentives and severity of punishment in the current proposal and support the refinements @Oliver has proposed.
One of the key reasons we are not seeing a lot of input on this topic is likely because we don’t have the data nor learned experience/wisdom upon which to draw from. This is mostly theoretical for Indexers til they feel the impact in a real sense. I suspect that some of @KonstantinRM 's thoughts on this topic come from his experience with thinking about the implications of the open disputes around POIs for P2P (please correct me if I am wrong about that, Konstantin!), and more indexers are going to develop an opinion on this topic when it impacts them or another Indexer they are close to. I think that this dynamic plays well into having a robust, data-driven iteration process for the charter. At least for the early part of its implementation/refinement.
This plays into my concerns around the recent events surrounding the first subgraph fatal error on mainnet, and the reality we face in terms of an on-chain events where stakeholders end up being punished monetarily for first-time incidents outside their scope of control or knowledge.
To summarise the event:
One of the ten Phase1 migration subgraphs sufferred a fatal indexing error at a specific block
A fatal indexing error means that the Indexers allocating stake to that subgraph cannot continue syncing the subgraph.
If the Indexer didn’t catch the issue within two(?) epochs and manually settle the allocation, they were forced to settle the allocation with an 0x0 POI which means they automatically forfeited all indexerRewards and queryFees for the allocation.
This occurred due to a bug in the subgraph rather than through the action of the Indexer, yet the Indexer is punished immediately - they lose the rewards accrued over the lifetime of the allocation - there is no arbitration over the application of this punishment (and I am of course impressing my own bias upon this action by calling it a punishment)
The existing arbitration process specifically states an example of the fatal subgraph error and that rewards must not be collected for that allocation (Section 9)
My biggest concern (in lieu of technology improvements) is that events like the subgraph fatal error and moreover unexpected events that don’t fit the existing model of arbitration, have no home within the process and instead can breed contempt within the community due to a sense of unfair play, that a stakeholder might be economically punished despite acting in good faith. I can see these issues causing stakeholders to act differently in terms of risk within the protocol, for example in the case of the fatal subgraph error some Indexers may decide to only support a small subset of subgraphs that they deem to be very stable, going against the in-protocol dynamics (signal, total allocations etc).
So going back to @Oliver’s quote - I feel that the “first fatal subgraph error on mainnet” incident on migration is a good example of a type of dispute that falls outside the more programmatic nature of the arbitration process as it stands today, is the type of issue that could be solved with technology and “thoughtful directional community feedback to develop solutions that prevent such disputes from recurring” per Zorro’s suggestion. We need to be accommodating in officially recognising such unexpected issues as the first fatal subgraph error on mainnet, lest future first-time issues breed ill-will within the community and net-negative incentive alignments within the protocol.
I would like to explore those thoughts further on Office Hours 14, and understand if others think that these sort of events, which will likely be par for the course on mainnet, fall under the eye of arbitration (both for punishment, compensation and future fixes to avoid such events in the future) or if they represent the price of playing the game, and should/can be mitigated entirely through technology improvements (Graph stack enhancements etc.).
Of course, I didn’t have enough time for it before because we were busy with new subgraphs. But with these disputes, I postponed other activities and focused on Charter and investigation for our cases.
And my words about Fisherman and his possible actions based on our disputes, I will clarify it a little bit.
For now, the Network has several different Indexers with wrong POI’s if our script works properly.
All of these indexers (that I found accidentally when we checked our POI’s during these days), small enough, like 1M self-stake and 0-6M delegations.
When we got these disputes, obviously, other Indexers also had wrong POI’s, but they didn’t get Disputes because our self-stake relatively big and looks better for potential slashing with current rules.
Another thing that looked bad from my point of view: Fisherman created these disputes right after Workshop (or pretty close to it, if I’m not mistaken 05.05). It was a new address, tokens from other sources, and he\she created 7 disputes against us and 2 against framework-labs. You can look at it here: Address 0x992bb240b1ef27bc95a2e4767d9de6f8bf6d9632 | Etherscan. Just 9 disputes with several minutes between each of them.
And yes, it worries me, personally, because it looks more about getting money than attempt to find someone who did something wrong or doing it all the time with no worries about the Network.
This scenario wasn’t explicitly considered in the writing of the charter. I would be in favor of modifying the proposal to allow an Indexer to submit the last valid PoI when closing an allocation if their allocation was created before the subgraph error occurred. Let me know if this addresses your particular concern.
More generally, I totally agree with you here that this is an area where we’ll need to iterate as more community members become familiar with the mechanisms.
Without betraying any information shared in confidence, I can say that the timing of the disputes was because the Fisherman in this instance wasn’t sure if they were allowed to dispute Indexers for faults that occurred before the Arbitration Charter was ratified. This was clarified in the Protocol Townhall, hence the timing of the disputes. The reason for the new address, as I understand it, was to not create ill will among Indexers.
That being said, none of the mechanisms in the protocol are intended to be punitive to honest participants–this is one of the reasons that the Arbitration Charter allows the Arbitrator to exercise discretion. Over time, as Graph Node and Indexer tooling matures and Indexers build confidence interacting with the protocol, there will hopefully be fewer and fewer instances in which the Arbitrator must exercise this discretion.
For now, I know the Arbitrators are aware of the immature state of tooling in the protocol, and while I cannot speak for them, I do not expect that they will punish Indexers for inadvertent PoI inconsistencies, especially since as @cryptovestor notes, this is the first time many of these issues are being encountered in the decentralized network.
Update: The Arbitration Charter GIP has been updated based on a lot of the feedback in the above thread. It can still be found on the zerim/arbitration-charter branch of the GIPs repo. Looking forward to getting your feedback on the changes.
For reference, here are the recent commits (can’t wait until Radicle adds PR support):
5236bff (HEAD -> zerim/arbitration-charter, rad/zerim/arbitration-charter) arbitration-charter: Add items to future work based on community feedback
5fcac46 arbitration-charter: allow indexers to collect indexing rewards right after subgraph fails
90a2906 arbitration-charter: add "future work" section
9dd6b6b arbitration-charter: add clauses based on community input
326f8ec arbitration charter: Add missing links to forum discussions
665dbc6 Add proposal for arbitration charter
if a subgraph has a bug that prevents indexing up until the current epoch, then a zero PoI should be submitted and indexing rewards must not be collected for that subgraph.
This mechanism is problematic.
The PoI contains data which attests to the fact that the subgraph has failed and in what way. Eventually when we migrate to verifiable queries, the error status would be a part of the PoI in such a way that the error message could be validated as the correct response to any query.
For the protocol to be decentralized and the Arbitrator role to be removed, there needs to be a consensus around the failure state of the subgraph - which the PoI provides. Once one Indexer attests to the subgraph failing, the protocol should still incentivize further validation of that failure status until consensus is achieved. This may include incentivizing Indexers to start indexing from scratch - which could be expensive but necessary to validate the failure status.
The mechanism by which the protocol signals the value of further consensus of the failure state is curation. Once the failing subgraph is replaced, curation should move to the new subgraph, and this is the time that Indexers should migrate over.
Furthermore, there is still value in having the subgraph indexed for historical queries at least until such time as the failed subgraph is replaced (and even then, possibly for some fuzz testing to ensure compatibility between the old and new versions). Query fees should still be paid after a subgraph fails for both historical queries and for serving error responses. Otherwise there is no way for a Consumer to know whether the subgraph failed or the Indexer(s) just abandoned it save for some additional mechanism.
It is worth noting why not paying for failed subgraphs has been proposed. The idea is that once a subgraph has failed, it becomes trivial to index and therefore there is no additional work that needs to be done or compensated. In theory, this is an attack vector because a Curator could deploy a subgraph that fails quickly and then collect rewards on it indefinitely. However, this attack already exists for subgraphs that do not fail. One could, for example, specify a single call-handler on a non-existent contract. Such a subgraph would also be trivial to index indefinitely. Since the mechanism does not protect against this attack, it is not being helpful and it’s drawbacks outweigh the potential benefits.
Attestation that is only correct with respect to the previous official version of the Indexer software
I think the language should account for the possibility that multiple previous versions of the software may have been sunset during the grace period and that any of them would be valid. There is other language to be modified to this effect in the same paragraph.
if a subgraph has a bug that prevents indexing up until the current epoch, then a zero PoI should be submitted and indexing rewards must not be collected for that subgraph.
I’m realizing now after having read my previous post on this a second time when linking someone to this discussion that I failed to mention what this should be changed to.
Instead, the same mechanism should be used to query the PoI for the same block as though the subgraph had not entered into a deterministic failure state. The way that the PoI is written it will continue to be updated and provide security.
This simplifies the indexer-agent as well as the logic as to how an Indexer should behave. The Indexer can continue to open and close allocations, and serve queries, as if nothing has happened. Historical queries would give results and queries after the failed block would give attestable errors.
If a subgraph has a deterministic failure, continue as normal by eventually closing the allocation with the usual non-zero PoI. If there is still curation on the failed subgraph, consider opening another allocation.
I generally agree with the charter and glad we have the discussion here and about the fact we have fantastic arbitration team. The rules are very good thought through and I think indexers are generally happy about the level of clarify at this stage. There is a clear evidence for presumption of innocence both in the charter and in the approach used by arbitrators and broader Graph team and the community, which is great!
Although i am generally happy with how things are, I would like to share my view on few points which I believe worth to consider.
I would like to agree with @KonstantinRM on the point of slashing based on the indexer’s own stake. My argument is while current mechanism sounds perfectly when it comes to slashing for incorrect query service because of the economical security metrics etc. The indexing rewards does not really depend on the indexer’s self stake and thus it would make more sense to apply slashing on the indexing rewards rather than tie to the % of the stake.
I would like to highlight my observation regarding the recent update of the chapter 9
An exception to this rule is if the allocation being closed was opened before the subgraph bug occurred. In this case, the Indexer may submit the last valid PoI they produced for the subgraph and collect indexing rewards for the allocation.
So the exception to the rule not penalise indexers for the subgraph failure is perfectly fine, but without clarity for how long backwards the last valid PoI should count it created the opposite -the incentive to stay on the failed subgraph for as long as possible.
What is even more worrying, then if such behaviour is supported we may have more precedents and some indexers may even deploy their own subgraphs which they know will fail at some point and signal on them to basically create the exclusive environment for them to enjoy rewards on the subgraph which only they are allowed to submit valid poi as the only allocated before the failure…
About subgraph failure I agree with this point from Zac, as long as it was a deterministic failure because of a bug from the subgraph developer it makes sense for the network to keep indexing it because it can still provide query results, even if not until the latest block, there is value in that. It would be interesting to consider an addendum to the charter with this condition.
Slashing could be a function of the allocated tokens and allocation duration to not overly penalize indexers that rotate allocations more frequently, today they are exposed to more risk than slow rotating indexers. We would need to calculate that the allocation security bond is enough to cover what an indexer gets from rewards to avoid any potential exploit.
Update: I have updated clause 9 of the arbitration charter based on the feedback above, and have pushed the changes to Radicle (commit # 7b9cfa81242697b86ca87f9d09e5d1fabe87399c).
I accept Zac’s point above that it could be useful to historically query a broken subgraph, or even know that a subgraph is broken, and what the error is, which is only possible if Indexers are online to report on it’s broken state.
Also, philosophically, it makes sense to me that it is the Curators who signaled on the subgraph that should determine when they no longer wish a broken subgraph to be indexed by migrating their signal. There is no need for the protocol to make this decision on their behalf.
I disagree with @KonstantinRM’s suggestion to only make indexing rewards slashable and not Indexer’s principal stake. Such a change would make the expected profit positive for a whole class of economic attacks, because there would essentially be no downside to attempting such attacks other than the opportunity cost of capital involved in the attack, which is a near-zero interest rate environment is relatively low.
I agree, however, w/ Ariel’s point about possibly only making allocated stake slashable, as opposed to an Indexer’s entire stake. This is actually already listed under “Future Work” in the GIP because this requires a code change and thus goes beyond the narrow norm-setting capabilities of the Arbitration Charter.
We still need to clarify the question(I believe it was good enough clarified in the charter, but people still discussing it):
If the subgraph failed, is it ok to start allocating on it after it’s failed or not? Because for now, we have a situation, when Indexers can keep allocation open for failed Indexer if they already there, but can’t open allocation for an already failed subgraph.
Speaking about “allocated stake” it looks much better than the current state. Because for now Indexer can create a new Indexer with 1M self stake → delegate 16M to itself and do everything they want. Why:
Nobody will create disputes for him, because of too low ratio: Fisherman’s bid vs 1.25% slashable body. Plus up to 56 epochs of waiting, freezing fisherman’s tokens.
Potential rewards from 17M allocated malicious way will be much higher than 2.5% slashing for his 1M self-stake. Especially with reallocating once per 28 epochs.
You can slash percentage of the owned stake based on allocated stake. So it’s not that bad.
So say you have 100M own stake and 500M total allocated, and you only allocated 50M on a subgraph, then the slashing is considered on the 10% of your own stake (equivalent to the allocation basically).
Indexers are starting to allocate to Opyn again. Any Indexer with a small stake can jump on this with low likelihood of disputes due to lack of incentive as the numbers are too low to warrant one. GRT earned per million allocated is something like 3000GRT per day with current numbers. They will likely eventually close allocations with the last good PoI. OR any poi that works, because they are less concerned about getting slashed at such an high rate of indexer rewards.
Maybe this is less of a problem once we have a lot more signal, but its for sure a strange/unique situation right now. No pressing incentive (for now) for the signaller to move that Opyn signal until they upgrade their sub. Signal I agree, per @Brandon s thoughts, is ideally how these dynamics become irrelevant in the future.