GIP-0058: Replacing Bonding Curves with Indexing Fees

howard · August 14, 2023, 11:52pm

GIP: 0058
Title: Replacing Bonding Curves with Indexing Fees
Authors: Justin Grana, Howard Heaton
Created: 2023-08-08
Updated: 2023-08-08
Stage: Draft
Discussions-To:
Category: curation
Implementations: NA

Amended 12/1/23. Original Document Unchanged except for “(see amendment)” notations. New “Amendment” section added.

Abstract

A key step to creating unstoppable dapps is creating a system whereby Indexers and data consumers can coordinate to get subgraphs indexed. The current coordination mechanism, curation with bonding curves, is rife with inefficiencies. This document provides a detailed description and rationale for indexing fees via a new mechanism that would replace (see amendment) curation to enable efficient coordination among Indexers and developers. With indexing fees, Indexers publicly post their price per unit of subgraph gas and consumers (or someone acting on behalf of the consumer), choose Indexers to perform indexing at the posted price. Security of the payments is ensured through on-chain collateralization contracts. While consumers can select Indexers directly, the mechanism also supports automated Indexer selection for scalability.

Executive Summary

Problem: The Graph’s current curation mechanism is inefficient for several reasons.

Subgraphs cost varying amounts to index, making it difficult for Indexers to predict GRT rewards.
Indexer payments are uncertain and volatile due to staking decisions in disparate parts of the network.
There is a noisy relationship between curation signal and the quality of indexing service for a subgraph.

Proposed Solution: Indexing fees are collected via a new mechanism that proceeds as follows.

Indexers publicly post a price per unit of work to index a subgraph.
A consumer chooses which Indexers it would like to contract to index the subgraph.
(More likely, an algorithm acting on behalf of the consumer can choose for them.)
The selected Indexers index the subgraph and then submit a POI that verifiably and deterministically states
how many units of work it took to index the subgraph.
The consumer pays the Indexer according to the posted price and the units of work.

Benefits: In addition to its simplicity, indexing fees yield the following favorable properties.

Indexer compensation depends directly on how resource intensive it is to index a subgraph.
Indexer revenue per unit of work is perfectly predictable and is not volatile.
The relation between consumer payment (per unit of work) and number of Indexers is perfectly predictable. This makes the relationship between consumer payment and quality of service more transparent.

Although indexing fees represent a radical departure from curation, it adds clarity and predictability to the protocol by limiting uncertainty. This will enable an efficient and scalable marketplace for indexing services.

Motivation

Decentralized and permission-less data services provide the robustness and censorship resistance required to build unstoppable apps. However, the pseudo-anonymous nature of a permission-less decentralized environment presents new challenges, especially in regard to novel attacks such as sybil attacks (i.e. where participants are able to improve their outcomes by splitting their identities) and ensuring parties are able to establish sufficient trust. To address these concerns, decentralized protocols (including The Graph), aim to implement mechanisms to deter behavior that is not in the spirit of the protocol while maintaining an efficient market. The Graph’s current curation system is meant to serve that purpose.

Today, coordination for indexing uses bonding curves. Each subgraph has a bonding curve on which curation shares are minted when a user deposits GRT. Each subgraph’s bonding curve is unique. Indexers are incentivized to index subgraphs with high signal (i.e. much GRT) so that they can collect indexing rewards. Under this mechanism, the rewards Indexers get on a particular subgraph fluctuate depending on the signal of other subgraphs and the staking decisions of other Indexers; this leads to volatility and uncertainty for Indexers and unpredictable quality of service for consumers.

Proposed Solution

To address the shortcomings of curation, we propose using indexing fees whereby Indexers publicly post a price per unit of work for which they are willing to index subgraphs. Consumers are able to choose Indexers to contract to index their subgraphs; often, we expect an algorithm with automate this process on behalf of consumers. The selected Indexers index the subgraph and submit a POI that verifiably and deterministically states how many units of work it took to index the subgraph. The consumer pays the Indexer according to the posted price and the units of work.

A key departure from curation is that, with indexing fees, the incentive to index does not come from indexing rewards, but rather from a direct transfer of GRT from a consumer to an Indexer. Today, Indexers are primarily compensated for indexing activity via indexing rewards. With Horizon, indexing rewards will be used to subsidize protocol security in the form of payments to collateral (see amendment). Therefore, indexing rewards will no longer be used to incentivize indexing activities. Instead, the incentive to index is provided through payments that flow directly from the consumer to the Indexer.

To fully document and explain the proposed mechanism, the remainder of this document proceeds as follows. First we identify key terms and then desired properties for a mechanism. The main details of indexing fees follows with an illustration on how they meets the desiderata. The next section is dedicated to an example automated Indexer selection algorithm. Frequently asked questions are addressed in following section, which are followed by the conclusion. An appendix (to be linked) includes an important analysis of subgraph gas and how it relates to computational resources required to index a subgraph.

Terms and Background

Indexers: Indexers are node operators in The Graph Network that stake Graph Tokens (GRT) in order to provide indexing and query processing services. Indexers earn query fees and indexing rewards for their services.

Consumers: The core users of data services are referred to as consumers. Specifically, the consumer is the one that receives a benefit (and thus is willing to pay) for data services. In some cases, this may be the developer. In other cases, a consumer may want to leverage the subgraph of another developer. In general, the consumer is the party that interacts with Gateways/Indexers for the indexing services.

Subgraph Gas. There are two notions of subgraph gas: sync gas and storage gas. Storage gas is simply the size of entities stored in the database, which correlates with the size of a subgraph on disk. Sync gas captures the amount of compute and disk operations required to sync a subgraph, write it to disk, and perform network operations (e.g. JSON RPC). Both storage gas and sync gas are verifiable quantities. A detailed relationship between subgraph gas and compute resources is given in Appendix A.

Horizon. (See amendment.) Graph Horizon introduces a general purpose scheme for securing bilateral transactions in The Graph ecosystem. The core aim is to establish trust via on-chain means; to do this, Horizon utilizes an escrow-like service with the capability of penalizing bad behavior (via burning tokens of the party at fault). Horizon “establishes trust” via economic incentives from smart contracts and allows “freely evolving services” due to a fundamental shift in how governance operates. With respect to the scope of this document, the only essential information to know about Horizon is that it enables on-chain agreements to be formed for coordinating procurement of services and arbitration. The details of Graph Horizon will be defined in a separate GIP.

Agreement: An agreement is used to identify the terms of service between an Indexer and a consumer. For example, a caricature agreement might be “Indexer will index a subgraph within 2 days for 1 GRT per unit of subgraph gas.” Agreements are often formalized via smart contracts; however, an off-chain agreement may also be formed between a consumer and a Gateway acting on the consumer’s behalf.

Desiderata

The desiderata — desired properties of a new mechanism— reflect addressing the issues with the current curation mechanism. Specifically, once agreements are reached, the Indexer should have high certainty in regard to its profit and revenue, and that certainty should not wane with time. On the consumer side, once agreements are reached, their price paid and received quality of service should both be predictable and steady over time. Reducing these uncertainties yields the desirable higher level properties simplicity and efficiency. Of course, this must also be done in a sybil resistant manner to preserve the robustness and censorship resistance benefits of a decentralized platform. The remainder of this section details each of the desiderata.

High Certainty – Indexer Revenue
Indexers should be able to forecast how much revenue they will achieve by indexing a subgraph. Uncertainty may arise when indexing revenue is influenced by actions of other participants (e.g. curators adjusting signal with bonding curves, Indexers unexpectedly allocating a large amount of stake on a subgraph).

High Certainty – Indexer Profitability
Indexers cannot know the total cost to index subgraphs beforehand. To ensure Indexers are profitable, it follows that there must be a way to compensate them proportional to their incurred indexing costs.

Low Price Volatility
Even if an Indexer knows the price today, how much payments fluctuate over time for indexing services can be complicated to reason about. This similarly affects developers when making plans to use data services for their dapps. Cognitive load and pricing efficiency can typically be improved by reducing volatility.

Low QoS Uncertainty
The consumer can easily see how much it needs to pay to achieve a given quality of service. Furthermore, the consumer can have the flexibility of obtaining a high quality of service by selecting several competent Indexers or selecting fewer but more effective Indexers.

Sybil Resistant
A key property of blockchains is identities can be easily duplicated across several wallet addresses. To ensure developers have their dapps serviced by multiple Indexers (e.g. for robustness to failure of any single Indexer), developers must be able to ensure, with reasonable confidence, they are able to attract
distinct Indexers to index their subgraphs.

Efficiency
The mechanism (approximately) maximizes the number of mutually beneficial transactions.

Simple
We aim to create a simple market. We consider this along two dimensions: actions and consequences are simple to understand (low cognitive overhead) and tasks are simple to perform (efficient workflows).

Proposal

Sequence of Events and Details

Under the indexing fees mechanism, the order of events can be summarized as follows. We then provide a detailed description of each step.

Step 1: Indexers post prices per unit of subgraph gas with a sync-speed warranty per unit of subgraph gas.

Step 2: Consumer (or agent acting on behalf of consumer) selects Indexers.

Step 3: Consumer and Indexers both enter an on-chain agreement that specifies:

Required Indexer collateral.
Terms of agreement (price per unit of subgraph gas, correctness warranties, slashable events, arbitrating party, collateral thawing period, etc.). The contract is enforced when the consumer executes a transaction to accept the terms and deposits funds to pay for the indexing services and the Indexer deposits the time-locked collateral.

Step 4: Indexer indexes the subgraph.

Step 5: Indexer posts POI.

Step 6: Dispute period begins.

Within the thawing period defined in the agreement the consumer can choose to raise a dispute that, if the arbitrator determined to be valid, would result in the Indexer being slashed according to the terms of agreement.
The Indexer cannot withdraw its payment nor its collateral until the dispute period ends.

Step 7: Indexer retrieves any funds (collateral and payment) after the dispute period.

Step 8: Indexer periodically reports ongoing sync gas and ongoing storage gas and withdraws funds from the contract accordingly.

Step 9: When the contract is over, the Indexer can withdraw collateral for the sync speed warranty.

Step 1 — Indexers Post Prices

Simply stated, Indexers will submit prices to the Gateway that it charges to index a subgraph. This is similar to the current process for the Indexer selection algorithm for choosing queries. Importantly, these prices are in terms of metered computational units of work and also contain a sync speed warranty. The motivation for metered pricing is to account for subgraph heterogeneity and to incentivize dapp developers to optimize their subgraphs. One of the main challenges with previous versions of curation was that payment to Indexers did not depend on the amount of work (compute, storage, etc) required to index a subgraph. With indexing fees, Indexer pricing is a function of computational resources required to index a subgraph. As an initial protocol design choice, prices will be in terms of “subgraph gas,” an approximation of computational resources needed to index a subgraph. See section A for more details on the relationship between subgraph gas compute costs. The sync-speed warranty provides a channel for Indexers to credibly differentiate the quality of their hardware and thus earn more revenue by deploying better hardware. By posting sync-speed warranties per unit of compute (subgraph gas), Indexers with better hardware can post faster sync-speed warranties and thus charge a higher price. For more details on how different hardware configurations impact sync-time, see Appendix A. Furthermore, Indexers can post multiple prices with multiple sync-speed warranties if they want to offer different levels of service at different prices.

Step 2 — Indexer Selection

Indexer selection can happen in two ways 1) directly by the consumer or 2) through an automated Indexer selection algorithm that optimizes for consumers preferences (e.g. quality of service). An automatic Indexer selection algorithm is discussed in detail below. This section focuses on the case where consumers choose their own Indexers. Under manual Indexer selection, a consumer is presented with a menu of Indexers and relevant features (price per unit of gas, geographical location, quality of service statistics, etc.). Note quality of service stats are trusted data (in the case of Gateway usage) and untrusted if manually collected. The consumer can then select one or more Indexers based on its preference to index its subgraph. Importantly, at this point neither the consumer nor the Indexer have entered into an agreement through smart contract. The selection step notifies Indexers that there is a smart contract that if they choose to enter would make them eligible for a payment upon successfully indexing of a subgraph.

Step 3 — Consumer and Indexer Enter Agreement

After the Indexer is notified, the consumer and the Indexer must both opt into the agreement. The consumer opts into the agreement by depositing GRT into a smart contract that would then be transferred to the Indexer upon successful indexing. The Indexer deposits collateral into the smart contract that would be slashed if indexing did not happen according to the agreements parameters. The smart contract specifies several parameters of the agreement including the price per unit of subgraph gas, the required amount of Indexer collateral, the maximum amount of gas the Indexer should spend indexing the subgraph, slashable events and parameters (how much an Indexer should be slashed for not meeting their sync-speed warranty, for example), arbitrating entity and collateral thawing period.

Step 4 — Indexing Occurs

This step is identical to how the Graph functions today.

Step 5 — Indexer Posts POI

This step is identical to how the Graph functions today

Step 6 — Enter Dispute Period

After the Indexer submits a POI the dispute period begins. Within this period, the consumer has the option to submit a dispute to the arbitrator (specified in step 3). If the arbitrator determines that the terms in step 3 were violated by the Indexer, the Indexer is slashed according to the agreement. The portion of slashed GRT that is returned to the consumer versus the portion that is burned is specified in step 3. During the dispute period, both the consumer’s funds and the Indexer’s funds are frozen in the contract.

Step 7 — Indexer Retrieves Funds

After the dispute period, the Indexer retrieves its entitled funds (both collateral and payment) that remain after any slashing events. Of course, because the collateral and slashing provides adequate incentive for Indexers to meet the terms in step 3, step 7 will often consist of the Indexer withdrawing its full collateral and the consumer’s payment for indexing services.

Step 8 — Continual Syncing and Payment

Since subgraphs undergo continual syncing, the Indexers post the incremental sync gas and disk space required to fully sync the subgraph. After a dispute period and assuming no slashable events, the Indexer can withdraw additional funds as compensation for the additional syncing resources.

Step 9 - Indexer Retrieves Remaining Funds

Although a correctness warranty can be reclaimed at the end of a dispute period (e.g. whenever a PoI is submitted + fixed time,) the sync speed warranty can only be reclaimed at the end of the agreement. A PoI may be required periodically to checkpoint syncing, but submitting a PoI and collecting payment for the subgraph synced so far does not obviate the need for the sync speed warranty to continue for the subgraph yet to sync past that checkpoint.

Desiderata Revisited

Given the full description of indexing fees in the previous section, it is now possible to show how the new mechanism meets the desiderata:

High Certainty – Indexer Revenue
The Indexer’s price (per unit of “work”) is perfectly predictable. It is simply the posted price. This is not affected by the behavior of other Indexers.

High Certainty – Indexer Profitability
Because Indexers post prices per unit of gas, they can charge more for more expensive subgraphs, thus reducing profitability uncertainty.

Low Price Volatility
The price an Indexer receives is fixed and does not change over time as a result of other Indexers’ staking decisions.

Low QoS Uncertainty
Each Indexer’s price and sync-speed warranty are publicly posted. Therefore, if a consumer knows the size of its subgraph, it can compute exactly how much it will cost for each additional Indexer.

Sybil Resistant
Because consumers select Indexers, they will select Indexers that they know to be distinct. Therefore, Indexers have no incentive to split their identity.

Efficiency
It is documented that in a general class of large markets, posted prices are the most efficient pricing mechanism.

Simple
The new indexing fees mechanism mirrors standard market mechanisms: Sellers post a price and consumers select what to buy based on product quality and cost.

Automated Indexer Selection

Although consumers may manually choose their own Indexers, those seeking a simpler experience could use an automatic Indexer selection algorithm. Such an algorithm may abstract away the need for consumers to screen individual Indexers. Instead, after a consumer provides their desired a) number of Indexers and b) sync speed preferences (high, medium, low), an automated Indexer selection algorithm can report the price and select the Indexers on behalf of the consumer.

An example algorithm works as follows. First, Indexers are sorted based on their sync speed warranties and then categorized into high, medium or low sync speed (based on pre-defined thresholds). Indexer’s that are dominated, i.e. have a higher price for a lower sync speed warranty than some other Indexer — are removed from selection. The number of requested Indexers in the tier are selected sequentially.

The first Indexer within the chosen sync speed tier is selected at uniform random. To provide robustness and guard against sybil attacks, the subsequent Indexers are selected at uniform random as long as their quality of service is not sufficiently correlated with previously selected Indexers. In other words, robustness and redundancy relies on Indexer performance being uncorrelated; this ensures, if one Indexer fails, there is no loss in service. By selecting Indexers that are uncorrelated, the automated algorithm ensures robustness. Furthermore, sybil identities are also likely to have a correlated quality of service, and so selecting Indexers with low correlation protects against sybil attacks.

This example algorithm is admittedly crude and imprecise. However, it already gives consumers greater control over their quality of service than they have today and, thus, makes the protocol more efficient. Nevertheless, there is nothing preventing other agents from developing more sophisticated Indexer selection algorithms, incorporating both on-chain and off-chain data to improve consumer experience.

Frequently Asked Questions

In this section, we address some of the frequently asked question that arise when replacing curation with indexing fees.

How do Indexers know which subgraphs to index?
They can be notified (e.g. via Discord, Telegram, POI radio, email) that they have been selected to index a subgraph at their posted price. We leave it open to the community to decide the best approach for notifications.

What is an Indexer’s incentive to index the subgraph if they are selected?
They will receive a payment equal to their posted price multiplied by the subgraph gas used to index.

Once selected, what actions must an Indexer take before making the indexing agreement valid?
An Indexer must opt-in to the agreement. This gives an Indexer the opportunity to verify the subgraph is available before accepting the indexing contract.

Can an Indexer post different prices for different subgraphs?
No, the price per unit of gas is network-wide. This is subject to change in the future.

Can an Indexer post a price based on a quantity other than subgraph gas?
Initially no, though this is subject to change.

Does this impact query fees and how Indexers are selected to serve queries?
No. Indexing fees only pertain to indexing and not serving queries.

Can an Indexer that is not selected to index a subgraph still index the subgraph and compete for queries?
Yes! Indexing fees does not preclude any Indexer from indexing a subgraph, it only provides extra incentive.

How much gas will a subgraph consume?
This is impossible to know exactly before indexing. You can find some statistics for common subgraphs that relate subgraph properties to gas costs in Appendix A.

Summary

Indexing fees are an improvement over the current curation mechanism. These add transparency and predictability which begets simplicity and efficiency for both consumers and Indexers. Together with Graph Horizon (see amendment), indexing fees are general enough to allow for future services on the network and to scale, independent of issuance.

Detailed Specification

The detailed specification of the smart contracts for this GIP will be added at a later time.

Copyright Waiver

Copyright and related rights waived via CC0.

Amendment (12/1/23)

After considering community feedback, we are proposing an amendment to this GIP. We propose:

having Indexing Fees exist alongside curation.
removing the dependency on “Graph Horizon” but only requiring a staking and dispute mechanism described in Step 3 and Step 6, above.

Importantly, this proposal now includes no change to issuance distribution.

Indexing Fees Alongside Curation

One of the original motivations for indexing fees was that consumers expressed frustration in predictably having their subgraph indexed with a stable quality of service. Adding indexing fees alongside current curation gives those customers an alternative channel to attract and incentivize indexers. However, since indexing fees will now live alongside current curation, the disruption to current consumers that are comfortable with curation will be minimal. For complex cases, consumers can use both current curation/signaling as well as indexing fees to incentive indexers with both issuance and direct payments.

Break from “Graph Horizon”

Replacing curation with indexing fees required a new mechanism for issuance distribution. This was described as a part of “Graph Horizon.” However, now that the proposal is for indexing fees to exist alongside curation, the need to repurpose issuance is obviated. To reiterate, there will be no change to issuance and the distribution mechanism as part of this proposal.

The other main component of “Graph Horizon” was the collateralization mechanism exactly as described in Steps 3 and 6 above. While in this proposal, a collateral contract is used to ensure trusted indexing, the concept of collateralized transactions with dispute periods is far more powerful and can enable a wide array of services on The Graph network. General collateralized transactions is an avenue we will continue exploring and communicating with the community as we obtain insight.

So what about Graph Horizon?

Briefly, Horizon had two main components 1) A general collateralization contract for secure transactions and 2) a change to the issuance mechanism. Currently, the plan is to continue exploring additional uses for collateralized transactions independently from improving the efficiency of the issuance mechanism.

Mr.1776 · August 19, 2023, 7:46pm

LOVE this. Curation seems to be a major point of friction and confusion for dapp developers. This would make things much more clear and efficient.

One question though, in the step 3 description it says:

The consumer opts into the agreement by depositing GRT into a smart contract that would then be transferred to the Indexer upon successful indexing.

Is there a way the GRT could be abstracted away so that the consumer could just transfer in USDC, DAI, etc. into the agreement, similar to the new billing process?

howard · August 22, 2023, 5:14pm

Thanks for your comment!

We will continue to work options in future initiatives, as we are in current initiatives, to potentially abstract away GRT from the user. It would be technically possible in DIPs, but we have not yet worked out the details.

inflex · August 24, 2023, 2:20pm

So, putting plainly:

We remove inflation, keep only query fees.

There’s some escrow nonsense and whatnot, but that’s how I read it. Do you expect customers to really pay anything similar to existing 3% inflation? I’m a large scaled indexer (35M GRT allocated across 70 subgraphs), and serving 1M queries per day at best. With average query price of 1e-5, I will get 10 GRT per day? Instead of currently daily indexing reward of 10K GRT that are distributed between my delegators.

Can you please tell me I’m entirely wrong in understanding this post, and inflation rewards will stay until we get significant query volume?

That3Percent · August 25, 2023, 4:25am

No, this GIP does not propose changing the inflation rate. (Though I can see how it may have been easily misread.) Instead, this proposal would only change how rewards are distributed.

Presently, indexing rewards are paid out according to curation signal. If this proposal and the Horizon GIP are accepted, indexing rewards will subsidize collateralized security via Horizon. There will be more detail and rationale for this new distribution mechanism in an upcoming Horizon GIP, but the incentive to stake GRT to secure the network and receive indexing rewards remains.

Yes, nobody is proposing changing the issuance at this time and we are aware that such a suggestion would carry significant political risk. If there ever was such a GIP (none is being authored to my knowledge) it would probably be titled in big red letters “GIP to reduce/disable issuance” and not hidden inside another GIP.

czarly · August 28, 2023, 4:28am

Thank you for developing a proposal to innovate on the economic model of the graph ecosystem. I’m afraid that it is based on the wrong assumption that the cost of serving a subgraph is based on the compute it takes to index and serve it. I’d like to offer my perspective.

The cost of the average indexer is not based on compute units but is fixed. This is because the main cost is in operating the archive nodes or firehose. They are constantly growing in size and the amount of supported networks growths constantly as well.

Different archive nodes have different storage requirements and different profiles of being able to be amortized outside of providing services to Graph network. While it is possible to connect graph node to hosted RPCs it is likely that such indexers have no way to operate competitively on the proposed market.

There are criteria like indexing speed but if the top 3 in the result list offer to distribute the resulting database to multiple geographic locations and offer similar speeds at the lowest cost then the only difference you can make as number 4 in the list is to buy a red background that makes you stand out and hope you can sustain your loss long enough to become one of the top 3.

Therefore the trick will be to host the infrastructure yourself and sell the spare capacity to competitors who will not be able to win on the indexing rewards marketplace against competition that they subsidize by offsetting their infrastructure costs for receiving worse performance. Only the largest operations will be able to pay for that.

I think the takeaway is that for the health of the ecosystem it is critical to not give the customer a choice who he wants to spend his money on and not give intermediaries the power to decide who will receive the indexing rewards.

That3Percent · August 28, 2023, 6:01pm

This idea is a non-starter for The Graph. But, this idea is common so it is worth addressing.

I’ll start by outlining how consumer choice factors into the Edge & Node Gateway indexer selection algorithm when selecting an indexer for queries. Then, I’ll show the problem that would happen if we removed consume choice. Then, I’ll show that this applies to indexing as well.

When selecting an indexer to serve a query, the consumer has preferences that they would like to maximize. These are things like:

How close to chain head the indexer is
The price the indexer is willing to serve the query for
The latency of the query
The reliability of the indexer and whether the indexer is experiencing an outage
The economic security (how much the indexer would be slashed for serving an incorrect result)
etc…

There’s more, but this is sufficient to make the point. Some of these properties are enforceable and based on on-chain data (eg: economic security). Some are enforceable and off-chain (eg: prices). And some are neither enforceable or on-chain (eg: latency). The latter category is a problem because the only reliable source of information like latency and robustness is the consumer’s own observation of the data. This can be shown rigorously.

Therefore if we want a consumer to be able to include information like how long a query will take to execute (based on predictions from historical observations) the only option is to give the consumer complete control over the choice. You can give the consumer the choice or you can exclude important customer preferences like latency, geographic distribution and robustness. Do you think The Graph would survive against a decentralized competitor that offers better uptime, cost and performance? Which service will the consumers actually use?

Tying this back into indexing payments… from the point of view of a Sybil-resistant protocol like Curation that chooses indexers for you: a single Indexer in Antarctica with 300M GRT is as likely to be selected as three separate indexers with 100M stake in each US / Europe / Asia. Between those two options which do you think is more decentralized and would offer better performance? Do you think consumers can tell the difference? The protocol today cannot differentiate between these options and never will be able to without inserting a quality of service oracle - which would only serve the function of centralizing indexer selection… the exact opposite of what the protocol is going for which will succeed if indexers can compete on their merits of being able to deliver high quality of service and there is no centralized entity to orchestrate.

So, what is being proposed here is to look at what works for the indexer selection algorithm for queries (which maximizes quality of service and decentralization) and replicate that success for indexing. The goal of the indexer selection algorithm as it applies to indexing is largely to pick indexers that will offer the consumers the highest quality and best prices over the long term (factoring both indexing prices and latency and expected query prices and latency based on historical observation).

I think the solution to this is to offer paid queries for other kinds of data (including JSON-RPC, firehose, and others). This is internally referred to as the “world of data services” initiative and is a big part of what is planned for Graph Horizon. When that solution exists I don’t think we will be better off with a system that offers indexers a fixed payment regardless of how many subgraphs they index or how big the databases produced by those subgraphs are. I think that is what is being inferred here is that since costs are fixed the best model would be to pay all indexers the same amount regardless of how much value they produce…

czarly · August 29, 2023, 2:04am

I see your point. I’m questioning that in practice those differentiators are actually very relevant for the consumers.

How close to chainhead? As close as possible!
Latency? as little as possible!
Stake size? a lot!

I assume there is a treshold up to which people care less about details but ultimately it’s whatever seems to be the majority opinion.

The price? Well… less!
Economies of scale as outlayed above will allow the few largest indexers to offer the (same) cheapest price.

Reliability?
This is exactly where the decentralized network should be the abstraction that allows the user to not have preferences. So far that detail is hidden behind the gateways and indexers basically inherit the brand of The Graph which is trusted by the customers. If individual indexers have to build their own brand it’s basically game over.

We know what will be the outcome don’t we? You give the consumer the choice to delegate and metrics to make educated decisions and they delegate to the top. It’s a simple heuristic and it will not change if we make the problem more complicated by adding variables.

The market for RPC queries is also cornered already. You have a few providers that can operate profitably and serve the majority of queries on every protocol. Individual operators can not pay their server bills, downgrade the quality and loose queries because the selection algorithms are very sensitive.

On top the few successful providers get grants from foundations to run public RPC infrastructure and subisidize their loss leaders to stay ahead. The experience with those markets is actually where I derived my opinion about a potential indexing marketplace from. Not everything has to play out everywhere the same. I just want to point to the upcoming issue that we’ll have to solve when we go forawrd with this.

ellipfra · August 29, 2023, 3:39pm

The need to update or replace the existing curation system is evident, given its drawbacks for subgraph developers and inefficiencies in allocating indexing rewards. While I am open to new proposals for improvement, I am hesitant to form a complete opinion on this GIP without full knowledge of the forthcoming ‘Horizon’ proposal. Since the two are closely linked, I suggest that the Graph Council hold off on voting until we fully understand the impact on protocol tokenomics, including those within Horizon and other related proposals.

Regarding the technical aspects, the proposal introduces the ‘Unit of Work’ as a central concept. While promising, the details matter. Currently, it seems the unit of work only accounts for tasks carried out by graph-node, overlooking other essential tasks like running RPC nodes and maintaining a firehose block set. This limitation is even more significant when considering substreams-powered subgraphs that rely largely on external processing, away from graph-node. More information is needed to understand how this would fit into the concept of “world of data services”.

Sybil-protection is another crucial aspect. The proposal incentivizes for an indexer to operate multiple identities to collect additional rewards, something that’s even simpler with recent updates to indexer software. While the suggested technical solution of correlation detection has its merits, it also presents significant downsides and challenges, potentially leading to a never-ending game of catch-up. A long-term, sustainable solution may lie in a reputation-based system, but this must be carefully designed to ensure fairness, ideally based on publicly available, on-chain data. Maintaining decentralization is key; there should always be a viable path for new indexers to join the system.

That3Percent · August 29, 2023, 10:47pm

I think the crux of my argument may have been misunderstood. My reading of your reply is that obviously people care about (for instance) minimizing latency, so why make this a “preference” if everybody wants it? I think what you are driving at is that a common algorithm should be employed and canonized in the protocol.

What I’m actually saying is that consumers care about minimizing latency but unfortunately the protocol has no decentralized mechanism for incentivizing it (nor can such a method exist). Since the protocol can’t guarantee low query latency for the consumer, the consumer has to take this into their own hands by choosing low latency indexers based on their own subjective observations of latency. This is simply not something the protocol can do for them. Since properties like low latency are existentially important to many use-cases of The Graph, we cannot remove consumer choice and expect to retain customers.

czarly · August 30, 2023, 3:23am

My argument goes like this: People care for low latency and they will always go for the lowest latency if it doesn’t cost them more than high latency. And since the winner takes all through economies of scale that’s what they get by simply delegating to the top which prices out everyone but the biggest 5-10 indexers and some enthusiasts who tolerate operating at a loss until they give up.

The selection algorithm of the gateways currently shields high latency operators by e.g. allowing them to process queries that are not latency sensitive because they touch only old data anyways. And those high latency indexers get the opportunity to sell their queries at the same price as the low latency indexers. But if the same users will get to choose if they prefer high or low latency they will probably not be aware that they actually get by very well with high latency. they just go with the best indexers which by definition can not be the current undercapitalized crowd that makes up the majority of the network waiting for the fun to get started with some real traffic.

One major problem for small indexers will be reliability. you basically force them to sign QoS agreements that forces them to build redundant setups. that immediately tripples their cost because they need every archive node twice and also 3 admins in 3 time zones that can take shifts to react on a crashing archive node to fulfill the agreement. Right now it’s okay to not be online. The gateways will balance that out. You don’t have a direct customer relationship. You loose opportunity and may get penalized by the selection algorithm but it’s not the end of your operation if you sleep through an incident.

This is exactly the environment individual node operators need to be able to compete or there is no ground on which they can exist. The decentralized network or the service it is implementing for the customer has to be tolerant regarding indivdual node failures without penalizing the failing nodes too much. If people start to host stuff at home for instance you deal with electricity shortages and failing internet connections and potentially hardware failures that can only be fixed after a day or two. But those are the people you need to motivate in order to get resiliency against nation state actors and also to win against the cloud.

Whats the endgame otherwise?

one operation in each of the 3-4 big clouds? why not just make the clouds offer the service directly as a product? AWS hosted subgraphs anyone? we don’t change the status quo like that. It’s just more of the same. It’s obviously the best situation for the customer because he doesn’t introduce any new assumptions since he anyway chose to trust the life of his company to his cloud already. and he doesn’t have to pay for traffic in the same cloud region etc.

But that’s also true for credit cards. For most people they work perfectly. We don’t need that decentralized stuff.

DataNexus · August 31, 2023, 1:35am

This GIP seems to propose removing the signal mechanic without addressing how indexing rewards will be determined. What is being proposed on this?

L1 curation had it’s obvious pain points, but have we also heard similar feedback on curation on v2 with the flat curve?

Does the introduction of Indexing Fees need to disable signal as the determining factor for indexing rewards?

czarly · September 4, 2023, 3:35am

Signal will not be a factor in the distribution of indexing rewards. There is no distribution from a pool of rewards. You get a direct payment from a customer routed through a protocol escrow contract.

Since it’s not “mining rewards” or “staking rewards” anymore in that case, I assume that I’ll have to charge VAT to the customer who chooses to buy “indexing services” from me. That implies that I have to issue him invoices that would allow him to deduct taxes later.

Therefore I’d need to KYC my customers properly. For one I can not accept crypto payments without KYC according to planned regulation. But right now I’d already need to get a billing address and tax number if it’s a company plus an email address to send him my invoice with my address and tax number as required by law.

So much fun.

Of course you could argue that in reality no one does it properly so why should anyone care. That’s up to everyone individually I’d say.

Inflation rewards will still exist somehow and are subject to a separate proposal by the name Horizon.

justin · September 8, 2023, 6:25pm

Thank you for all of your comments. We, as the proposers, are taking the comments very seriously and are working on addressing them. Please forgive the slow response but more information and communication will be coming soon

Brandon · September 19, 2023, 2:48pm

Thanks for putting this proposal together @justin and @howard.

I am largely in support of this proposal, for several of the benefits listed, but with one important caveat:

I believe this proposal should be implemented as an additional way of compensating indexers, living side-by-side with the existing indexing reward subsidy (or an improved version of it), rather than attempting to replace it altogether as the primary way of incentivizing Inders to index a given subgraph.

This unlocks the main desiderata outlined above, while avoiding some of the drawbacks of replacing the indexing reward subsidy as a mechanism for incentivizing useful work, as has been hinted at.

It also adds more payment flexibility to the network at a time when it is preparing for supporting more data services, which may have different usage patterns that benefit from different indexer compensation models.

Benefits of keeping an indexing reward subsidy tied to useful indexing work.

I understand this proposal depends on Horizon to specify the changes to the indexing reward subsidy, but it has been hinted that the subsidy would be paid out indiscriminately to anyone who stakes GRT rather than being allocated towards Indexers provably performing useful work.

Before making such drastic changes, it is worth keeping in mind the following benefits of the existing indexing reward subsidy:

Incentivizes protocol security. Until The Graph implements real-time ZKP-based validity proofs for all its data services, it will continue to rely on optimistic validity guarantees. Such systems have a 1-of-N honest minority security assumption. This means that every new Indexer validating a subgraph provides a spillover benefit of improved security to every other user of that subgraph, including users that are not directly paying that Indexer for indexing or query services.
Promotes decentralization. As @czarly notes, there is a benefit to decentralization of having any Indexer be able to permissionlessly process a subgraph in exchange for rewards. If the only direct incentive for indexing is fees paid at the discretion of the consumer, then it is possible that Indexers with higher brand recognition or economies of scale are able to capture a disproportionate amount of the indexing fees. This is especially true, since unlike query fees, this proposal outlines long-lived relationships between a Consumer and an Indexer, and thus differentiation on the basis of infrastructure quality is more likely. An important feature of a competitive market place is fungibility of the service providers.
- Centralization at the indexing market layer would likely have downstream impact on competition at the query market layer, as Indexers that were selected for indexing rewards would be able to price queries more competitively than Indexers that need to cover their fixed costs of indexing through query fees alone.

As an aside, it’s worth noting the pivotal impact that indexing rewards have had in building out The Graphs indexer community. If indexing rewards were reallocated to any holder of GRT, as opposed to those who have made the substantial investment of time, effort and money to ramp up as Indexers, I would expect the overall subsidy directed towards Indexers performing useful work to substantially decline.

Additional critiques

While I am generally in support of this proposal as an add-on to the protocol, I would also offer the following additional critiques of the proposal as written:

Overstates the benefits of the proposal with respect to improving cost and quality of service uncertainty.

The fundamental problem that the network has today with respect to indexing cost uncertainty it is that it is difficult to predict the cost of indexing a subgraph without actually doing the work of indexing the subgraph.

Today, this problem is felt by both Indexers and Curators:

Indexers do not know a priori whether the projected indexing reward for a subgraph will be adequate to cover the future costs of indexing the subgraph while leaving room for a modest profit.
Curators don’t know whether signaling a certain amount of GRT will be adequate to incentivize indexing of the subgraph in perpetuity, given the uncertain future cost of indexing (as well as other variables).

At first glance, this proposal appears to solve the problem for Indexers, by allowing for more predictable pricing based on well-defined units of work. However, the primary effect of this proposal is to shift the burden of the indexing cost uncertainty from Indexers to developers.

Developers (or other Consumers in the proposed indexing fee market) will need to adequately predict future computational costs of indexing or else risk having their escrowed amount drained and experience an outage on their subgraph. This outcome is arguably more likely than in the current design, where a developer can signal enough GRT to have a margin of safety in their indexing rewards, such that they would see the number of unique Indexers on their subgraph gradually decline, one-by-one towards zero, rather than have one or more bilateral indexing agreements all fail at once.

On the one hand, shifting the burden of indexing cost uncertainty from Indexers to developers makes a certain kind of sense: after all, developers have more control over the design of the subgraph and have more domain knowledge over the expected usage patterns of the underlying smart contracts. On the other hand, the only way to predict the cost of processing a subgraph or substreams today is to deploy fairly complex infrastructure, which is precisely the type of work that developers are looking to outsource by coming to The Graph in the first place.

A more fundamental solution to the indexing cost uncertainty problem, in my opinion, would be to enable static analysis, dynamic sampling or other statistical techniques to predict the indexing cost of any subgraph, which could be done through command line tools or in products like the Subgraph Studio to help both developers and Indexers make better decisions.

Economic efficiency, to the exclusion of other objectives, is not desirable

It could be said that crypto and web3 explicitly reject, or at the very least tradeoff, the narrow definition of economic efficiency above in favor of decentralization and anti-fragility.

This is evident in the block size debates in Bitcoin and gas limits in Ethereum. In both instances, the amount of mutually beneficial transactions are intentionally limited in order to preserve a public benefit of long-term robustness and decentralization of the system.

In the greater macro context, we’ve seen that short-term maximization of mutually beneficial transactions in the global supply chain introduces central points of failure and system risks, as was experienced during the Covid lockdowns after many critically important industries had outsourced the bulk of their manufacturing to China.

It’s clear that economic efficiency alone is an inadequate goal.

A challenge I see in adopting more expansive goals is that much of the standard economic toolkit seems to center around a narrow view economic efficiency as the single objective and such an objective lends itself to rigorous and analytically tractable models. The benefits of decentralization, however, as well as the impact of mechanism design choices on decentralization appear to be fuzzier or at the very least less studied.

While it can be tempting to stick to what is familiar and tractable, I think every proposed economic change to The Graph should incorporate some analysis of the expected impacts on decentralization, even if only a qualitatively assessment as offered by @czarly above. And these impacts must be balanced against any expected gains in economic efficiency.

Oliver · September 20, 2023, 3:59pm

I appreciate the effort that has been invested in tackling areas of improvement in the network, the benefits are well articulated. I share the major concern of the weakening effects on decentralization that have been expressed in prior posts. We had broader discussions around stake centralization in the protocol two years ago. Back then, we already had a high concentration in the protocol where the top 5 Indexers had about 50% of total stake and this was broadly recognized as a critical issue within our community:

There were a lot of discussions and ideas presented to address stake centralization. In this forum post, we discussed proposals such as a decentralization delegation tax, new delegation rejection for Indexers or Indexer decentralization thresholds. We also discussed changing the static 16x delegation cap in order to achieve more stake decentralization. Aside from a minor update to the Indexer table UI in the Explorer, these ideas have largely remained on the drawing board.

@Brandon suggested an update to GIP-0058 aimed at mitigating the anticipated centralization effects. We could broaden that approach and seize this moment to incorporate additional measures that bolster Indexer and stake decentralization, even if they don’t directly pertain to GIP-0058. For instance, revisiting the delegation cap discussions from yesteryears, we could contemplate lowering the cap below 16x. This would bring more larger indexers at/near the cap limit and encourage future delegations to gravitate towards smaller Indexers, who generally have more cap room. Naturally, such discussions and proposals would happen in separate threads and end up in different GIPs, but the approval from the Council could be sought as a package together with GIP-0058.

I acknowledge that introducing seemingly unrelated elements into a focused discussion is unconventional and harder to navigate. However, the point of the matter is that past decentralization initiatives garnered significant community support but failed to materialize into actionable outcomes. By bundling GIP-0058 with other proposals, we might achieve an overall trade-off where centralization dynamics from GIP-0058 are sufficiently offset by other decentralization-supporting changes.

justin · September 20, 2023, 8:26pm

TL; DR to @Brandon 's insightful post.

We agree on the two major points:

Simultaneous deployment is a plausible path forward
Uncertainty can be reduced via statistical exercises

Importantly, the points of agreement are actionable!There were some unsubstantiated claims in the economic intuition that we want to call out so that they don’t get propagated without proper foundation. There is a bit of a communication gap regarding efficiency that we will also try to clear up, but those are secondary to the main action items.

I believe this proposal should be implemented as an additional way of compensating indexers, living side-by-side with the existing indexing reward subsidy (or an improved version of it), rather than attempting to replace it altogether as the primary way of incentivizing Inders to index a given subgraph.

This was something we originally thought wouldn’t be able to work, but think it might be worth revisiting. The tricky part is that independently, we have a good idea on how each system works. However, if they are deployed simultaneously, there might be some interaction. If we can make sure that the two systems don’t interfere with one another, I think letting the participants choose the indexing mechanism is a plausible path forward.

As @czarly notes, there is a benefit to decentralization of having any Indexer be able to permissionlessly process a subgraph in exchange for rewards. If the only direct incentive for indexing is fees paid at the discretion of the consumer, then it is possible that Indexers with higher brand recognition or economies of scale are able to capture a disproportionate amount of the indexing fees. This is especially true, since unlike query fees, this proposal outlines long-lived relationships between a Consumer and an Indexer, and thus differentiation on the basis of infrastructure quality is more likely. An important feature of a competitive market place is fungibility of the service providers.

The meta-point of enabling decentralization is well taken and understood. However, there are some unsubstantiated claims here that are misleading. The insinuation that a) economies of scale are bad for a decentralized environment, b) the claim that fungibility of service providers is an important feature (or even a feature at all) of a market (decentralized or otherwise) that operates at the socially optimal competitive equilibria and c) the insinuation that off-chain trust via brand recognition is counter to the ethos of decentralization are all unsubstantiated. I don’t think this is the right forum to discuss things (as they are not first-order important to the GIP) but here are some references on topics like the theory of socially optimal product differentiation (related to fungibility) (http://faculty.washington.edu/mfan/is582/articles/Lancaster1975.pdf), or the cost of trust (related to why there are benefits to allowing off-chain trust) (The Role of Market Forces in Assuring Contractual Performance on JSTOR). But the main point is that we agree that making sure that users that demand a decentralized environment for data services should be provided the best one possible.

A more fundamental solution to the indexing cost uncertainty problem, in my opinion, would be to enable static analysis, dynamic sampling or other statistical techniques to predict the indexing cost of any subgraph, which could be done through command line tools or in products like the Subgraph Studio to help both developers and Indexers make better decisions.

This is still very much a possibility and agree that this should be provided to the community, regardless of the mechanism. This could also be a gateway service.

It’s clear that economic efficiency alone is an inadequate goal.

I want to step out and say it doesn’t seem like it is efficiency that you are objecting to but instead a certain narrow notion of efficiency. We did not want to define efficiency for brevity’s sake but what it really is is the long-run time discounted sum of total benefits minus total costs for all participants. So if as czarly says, the change would raise the cost for indexers, that should be included in the efficiency calculation! Furthermore, I don’t think anyone will argue that decentralization can be expensive (at least in the short-term), but it does indeed provide the benefits of robustness, unstoppability, censorship resistance and resilience, among others. The goal of maximizing efficiency is to provide an environment where the costs to achieve those characteristics are minimized while the benefits are maximized. However, users are heterogenous in their demand for decentralization so the best decentralized solution (the one that maximizes total benefits minus costs) is the one in which indexers optimize their resources to provide the best service at the lowest cost to them and consumers can choose the degree of decentralization they want at clear and stable prices. This proposal is directed toward that goal. Or in other words, an efficient decentralized platform is better than an inefficient decentralized platform

czarly · September 22, 2023, 6:07am

@justin I don’t know what kind of degree is required to participate in this discussion but I’ll just shut up and enjoy my life trying to beat the big ships on decentralized RPC market places where my unsubstantiated opinions are derived from after 2 years of actively selling RPC services.

However, there are some unsubstantiated claims here that are misleading. The insinuation that a) economies of scale are bad for a decentralized environment, b) the claim that fungibility of service providers is an important feature (or even a feature at all) of a market (decentralized or otherwise) that operates at the socially optimal competitive equilibria and c) the insinuation that off-chain trust via brand recognition is counter to the ethos of decentralization are all unsubstantiated.

a) bitcoin mining
b) amazon beats ebay (because of fungability of suppliers in the backend)
c) binance (puts it’s brand on any validator and instantly absorbs all the delegations)

This list is not a complete statement but it’s what I can offer.

czarly · September 22, 2023, 6:07am

Thank you @Brandon for joining the discussion. I also think that no one could stop anyone from incentivizing the indexers in additional ways out of protocol already. The question remains why this doesn’t happen. Potentially the market for that is already captured by other products?

Following the ideas from The Innovators Dilemma it’s likely that it’s impossible to beat the highly optimized supply chain of existing solutions for existing enterprise customers that gets constantly refined to defend their market. I highly recommend the entertaining read.

The proposed solution in that classic book is that we need another customer. One that is enabled by the product we provide regardless of all the downsides. And then grow from there, essentially capturing the market upwards from the least valuable customer. Because the upscale solutions happily abandon those low value customers one after another to focus on the higher margin customers. They are forced to by their cost structure and in order to stay competitive in the market segments they chose to serve.

In my technically blurred view there is exactly one metric where TheGraph is potentially unbeatable which is latency. Imagine we put one indexer in every city on the planet. We can respond to every user device within 10 ms. The user doesn’t pay for bandwidth on the country level. Local indexers could optimize their “cache” based on demand like local caches from FaceTube and Netflix.

Providing such a runtime for backends would be a unique selling point. By providing RPC services on top we could turn into the Cloudflare of Web3. In fact Cloudflare has a product that allows developers to run serverless code directly on their edge locations which is not too far off and apparently it’s useful to some. But if you ask the current customers of the current value chain for their preferences you won’t see this emerging because they will ask us to be more like what they can already buy elsewhere just better.

justin · September 22, 2023, 4:58pm

I don’t know what kind of degree is required to participate in this discussion but I’ll just shut up and enjoy my life

Most importantly @czarly, I would like to apologize if my response came off as elitist or exclusionary. That was not the intention and we welcome your opinion. Forums like these can be tone-deaf and dehumanizing and I apologize if my response came off as cold.

a) bitcoin mining

b) amazon beats ebay (because of fungability of suppliers in the backend)

c) binance (puts it’s brand on any validator and instantly absorbs all the delegations)

Regarding your three examples, I think the devil in the details. I am not saying that all forms of a),b),c) are good. I am saying that it hasn’t been shown that ALL versions of a),b) and c) are bad. In reference to your examples:

a) Of course if one firm dominated all the mining that would be bad, but as many firms innovate and reduce their costs, the average cost to mine each block goes down (ignoring new entrants). There is a trade-off of course, but having many low cost participants seems inline with decentralization, but maybe that’s just my interpretation

b) Again, devil is in the details. I do think that Indexers with identical qualities should be fungible/treated equally. I was implying that, one indexer with killer hardware, strong uptime and built in redundancy shouldn’t be treated (maybe compensated?) the same as an indexer on budget hardware, with high latency and lower uptime. I suppose that is up for debate too, but I wanted to be clear.

c) that example you gave is indeed a case we want to avoid. But on-chain trust can also be expensive, so if we can combine the two for a better system, that seems like a win. I’m not saying we want purely off-chain trust, but simply including off-chain information does not seem a priori bad. The example that comes to mind in the graph is that indexer characteristics are almost entirely off-chain right now (latency, uptime, etc). Including that information to establish “trust” doesn’t seem bad for decentralization, but again, that is also debatable.

Thank you again for your thoughts, they are taken sincerely.

Topic		Replies	Views
GIP-0063: Optimizing Query Fees sent to Indexers Governance & GIPs indexer , curator	23	4002	November 17, 2023
GIP-0028: Subsidized Query Settlement Governance & GIPs protocol-gov	18	3219	August 26, 2022
What will curation look like on the "horizon"? General curator	16	1993	January 11, 2024