Thank you for the further clarification and your detailed representation of the mistakes made. This really helps us track down what went wrong and hopefully avoid such situations going forward.
Allow me to elaborate a bit more on what actually happened on our side.
I am totally with you that one should test such a tool on the Testnet, however, this was not the cause of the problem here. The allocation tool is not yet used in an automated way at all, but instead only gathers information and suggests rules to be added to the indexer agent. And we didn’t even do that.
Instead, we set up a global rule
decisionBasis=rules allocationAmount=10.0 parallelAllocations=1 minSignal=500 to pick up and pre-index all new subgraphs before actually allocating to them and taking the subgraphs into production, thus providing access to new subgraphs to the community as fast as possible. In the meantime we learned that pre-syncing is also possible with
parallelAllocations=0 , but we will rather hand-pick the subgraphs going forward.
This led to a situation where the indexer agent picked up all the newly released broken and even worse bot bait subgraphs that were immediately removed again. In this case the indexer agent was not able to automatically close the allocation, because the subgraph was no longer available. We first tried just setting the rule for these subgraphs to
never , but due to missing metadata, the indexer agent just errored and stopped trying until the next restart.
We then tried only manually removing the failing allocation, but this just led to the indexer agent getting stuck on the next missing subgraph, which is why we ultimately decided to take the hit and manually close all open allocations (roughly 50, each transaction costing around $30) and start fresh.
The devil is probably in the details here. Yes, we closed the allocation with the wrong POI. Why it was wrong in the first place is what we still need to figure out.
However, as said, our indexer agent was completely clogged with broken and bait allocations and thus not working at all, so we saw this as a bug and followed the recommended procedures.
Honestly, we really tried to do the right thing here. We could have just closed everything with 0x0, but we thought it would be cleaner to provide the POI we were able to produce.
First of all, I would like to thank you for explaining this so clearly and sending the links along. In the future, I hope on the one hand that we no longer have to manually close allocations and on the other hand we will use these tools if it comes so.
However, I would like to ask how these differences in the POIs can arise? For me it is a certain information asymmetry. If you consult sources like The Graph Academy (which was also sponsored by a grant from the Foundation), you will only get the information to figure out the POIs the way we did. (Failed subgraphs - Manually Closing Allocations - The Graph Academy). Neither there nor in the official documentation (Indexer | Graph Docs) it is mentioned that one should question this POI again. There is already a section in the official documentation that deals with security, infrastructure and POIs, but there is no mention of this.
Don’t get me wrong, we will use the graphprotocol-poi-checker and keep this in mind in the future. However, this is information that we feel is not readily available. If you rely on the common sources, there is nothing about counter-checking the POIs with an additional tool (as you expect to get a correct result with the query shown at Graph Academy).
We do not want to discredit anyone with this statement. We appreciate the work of everyone in this community and think it’s fantastic how much has been and is being done. We try to do our part as well. Nevertheless, in this case it is a mistake that was not foreseeable and not with a negative or malicious intention.
This was a mistake that could have happened to many indexers who were in our situation and tried to index as many subgraphs as possible. Also, our transaction history shows that we were not aiming to cheat any indexing rewards by doing this. By being forced to close the allocations manually, we rather lost money.
Basically, this problem leads to many new subgraphs not finding indexers directly. Our goal was also to help the ecosystem and new subgraphs by allocating directly to them and indexing them. Due to the fact that the indexer software can’t handle this circumstance when there are broken subgraphs, we were forced to close allocations manually. In the future, we will have to select subgraphs manually, as this would otherwise lead to too great a risk (in terms of time and money).
This is “just“ a follow up error, since the indexer agent somehow managed to pick up the freshly closed allocation and create a new one for this subgraph. The POI is still invalid obviously, since nothing changed in the meantime.
All that being said, we still need to figure out what actually went wrong with the POI and how to proceed. We provided the requested data, but I’m still unsure how to figure out at what block our indexer diverged and how to locate the erroneous data. Wouldn’t we need a correct database dump to even make the comparison?
Is there any guide or forum thread we could follow? With the current tooling at hand, we can only say that the data is somehow wrong, but not why this happened IMHO. And figuring out the why and fixing the root cause seems to be the most important issue, since no one wants to serve incorrect data obviously.
So, any help and pointers really appreciated here!
Thanks again for your explanation and the valuable information you provided. We are very curious to see how this procedure will turn out. We would like to emphasize again that we had no bad intentions. Of course we hope for a fair judgement and even more important: We hope that this dispute will show transparently how we work and what are also various aspects where the information situation, the information distribution and clarification as well as the indexer agent could be improved after the dispute.