```
GIP: 0034
Title: The Graph Arbitrum deployment with a new rewards issuance and distribution mechanism
Authors: Pablo Carranza Vélez <pablo@edgeandnode.com>, Ariel Barmat <ariel@edgeandnode.com>
Created: 2022-06-01
Updated: 2022-09-13
Stage: Candidate
Discussions-To: https://forum.thegraph.com/t/gip-0034-the-graph-arbitrum-deployment-with-a-new-rewards-issuance-and-distribution-mechanism/3418
Category: "Protocol Logic", "Protocol Interfaces", "Economic Parameters"
Depends-On: GIP-0031
Implementations: Original implementations in https://github.com/graphprotocol/contracts/pull/571 and https://github.com/graphprotocol/contracts/pull/582/files - superseded by https://github.com/graphprotocol/contracts/pull/701
Audits: See https://github.com/graphprotocol/contracts/pull/710
```

# Abstract

This GIP introduces a new deployment of The Graph Protocol to the Arbitrum One Layer 2 blockchain. It proposes a plan to run the protocol on Ethereum mainnet (L1) together with the protocol on L2 instead of fully migrating in one go. The community would gradually move to L2 by starting with an experimental phase, in which indexing rewards are disabled, and then slowly increasing the amount of rewards in L2. To enable this, this GIP also proposes a change to how rewards are issued and distributed: instead of minting on the RewardsManager when allocations are closed, rewards would be pre-minted with a drip function in a new Reservoir contract. This function must be called at least once per week, and sends a configurable amount of the rewards to L2.

# Motivation

With rising gas costs, there is a growing interest in the community for a Layer 2 scaling solution. As mentioned in GIP-0031, there have been forums discussions highlighting this need, and conversations among core dev team members pointing towards optimistic rollups, and particularly Arbitrum, as a reasonable first step in this direction. This is a big change, however, and not risk-free. So it would make sense to approach this with caution, and gradually mitigate risk along the way. For this reason, we’d like to explore an Arbitrum protocol implementation that works initially with no indexing rewards and therefore useful for experimental subgraphs. This would be a first step towards running the protocol with rewards on Arbitrum, assuming the experimental phase is successful. It would be beneficial, however, for the mechanism for rewards distribution to be implemented from the beginning, so that governance can gradually increase rewards in Arbitrum as the community gains confidence in the L2 network.

# Prior Art

We’ve looked closely at the way Livepeer carried out their L2 migration, as described in LIP-73. They went for a full migration, whereas we’re proposing a more gradual approach where L1 and L2 coexist.

The rewards accrual and drip system proposed here is inspired by the way the Maker Rates Module works.

# High Level Description

We propose a gradual move to L2 by deploying the new L2 protocol in parallel to the existing L1 deployment. Initially, this new deployment would be experimental because indexing rewards would be disabled. Eventually, governance can enable rewards by increasing the configurable fraction of the total rewards that are distributed in L2. This new network would be deployed on Arbitrum One (after testing on Arbitrum Goerli).

At a high level, the L2 network will mostly work like the existing L1 network. Most of the contracts will be deployed unchanged, so staking and curation can be done in the same way as in L1. To participate in the L2 network, users can move GRT to L2 using the bridge proposed in GIP-0031, though future GIPs can propose ways to facilitate migration of staked tokens or subgraphs and curated signal.

The main change to support the L2 deployment will be a redesign of how indexer rewards are issued and distributed both in L1 and L2. Currently, rewards for subgraphs and allocations are computed and snapshotted when signal or allocations change, and then minted and immediately distributed to indexers and delegators when an allocation is closed. This is shown in Figure 1.

Here we propose changing this to a periodic drip function that can be called by Indexers and potentially other addresses whitelisted by governance, and mints tokens to cover rewards for a period of time (one week) in the future. This drip function will exist on a new contract called a “Reservoir”, particularly the “L1Reservoir”, which will hold the tokens until the RewardsManager pulls them when an allocation is closed.

On L2, the L2Reservoir will not include a drip function. Instead, the L1Reservoir will send a configurable fraction of the rewards to L2 using the GRT bridge, together with calldata to call a `receiveDrip`

function on the L2Reservoir, whenever the drip function is called on L1. This will update the variables so that accrued rewards can be computed correctly. This fraction (`l2RewardsFraction`

) will be set to zero throughout the experimental phase, and can then be increased gradually by governance as the community gains confidence in the L2 solution.

Figure 2 shows the way the rewards issuance (now separated from distribution) would work in the L1+L2 scenario.

The drip function can provide a keeper reward for whoever called it, to incentivize Indexers to call this periodically without incurring additional costs from gas.

Once rewards have been issued, and until the reservoir runs out of funds, indexers can close allocations as usual. The RewardsManager will query the reservoir on its corresponding layer to calculate the rewards owed for a particular allocation, and then pull the rewards from the reservoir so that the Staking contract can distribute them. This is illustrated in Figure 3.

After deploying the Arbitrum bridge, it should be safe to deploy the L2 network including the L2Reservoir, without necessarily updating the L1 side at the same time, since rewards on L2 will be zero for some time. The L1Reservoir deployment and L1 RewardsManager update can happen at a later time, allowing us time to test each component separately.

# Detailed Specification

## Payments and query fees

We would deploy a new AllocationExchange with enough funds to pay for query fees that are served through L2 allocations. Indexers must know that a Voucher they receive is for an L2-allocation and they would claim it through the Arbitrum AllocationExchange.

## Epoch management and block numbers

The EpochManager contract would be deployed to L2 as-is, and L2 would follow its own epoch count. Off-chain components like the Block Oracle should follow the epoch and epoch start block reported by the EpochManager on each layer. This would use the `block.number`

as seen by the contract, which follows the L1 block number with a precision around 10 minutes (or up to 24 hours in a Sequencer downtime scenario).

Epoch length would initially be the same as in L1, but it might change in the future.

The existence of two different block numbers (`block.number`

, which follows L1, and the RPC block number, that reports a separate L2 block number) means that off-chain components should be careful to use the correct number for each purpose:

- To compute epochs, e.g. to decide when to close allocations, use the block number reported by EpochManager, or
`block.number`

. - To compute the Arbitrum chain head when indexing Arbitrum, use the block number reported by the Block Oracle, that should follow the L2 block number reported by the RPC node.

## Governance

The multisig for the Graph Council has been deployed to Arbitrum at this address, and the multisig for the Arbitrator has been deployed to this address. Governance actions on L2 can be performed directly on L2 through these accounts.

## Rewards calculation, issuance and distribution in detail

What follows is a detailed description of how rewards calculation works at the time of writing this GIP, for background, and then a description of the changes proposed to support L2.

We will use the following notation:

\rho: rewards per signal

R: rewards

p: GRT total supply, or base amount for the inflation calculation

\sigma: signal, i.e. tokens on the Curation contract

\omega: allocated tokens, i.e. tokens from indexers’ stake that are allocated to a subgraph

\gamma: rewards per allocated token

r: issuance rate (including the +1)

S: set of all subgraphs

A: set of all allocations

And we’ll be using subscripts for subgraphs and superscripts for allocations (e.g. R_i^k is “rewards on subgraph i and allocation k”). When mentioning snapshots, since signal and allocations are discontinuous, we use the superscript minus sign (as in \rho_i(t_{\sigma_i}^-)) to denote a left-hand limit, i.e. snapshotted right before the signal or allocation changed.

Values are expressed as functions of time t in blocks.

### Background: current L1-only design

Rewards are calculated and snapshotted in two dimensions:

- Rewards per signal, updated when signal for a subgraph changes, and
- Rewards per allocated token, updated when allocations for a subgraph change

Reward distribution follows the approach from Batog et al but it is done twice: first, treating each subgraph as a staker for the total rewards (computing rewards per signal), and secondly, treating each allocation as a staker for the subgraph’s rewards (computing rewards per allocated token).

If t_\sigma is the last time signal was updated on any subgraph, every time we want to calculate rewards, we should calculate the new value for accumulated rewards per signal as:

And then we use this to compare the current value with a particular snapshot:

Where \rho_i(t_{\sigma_i}^-) is the snapshot for the subgraph’s accumulated rewards per signal, last updated when signal for that subgraph changed (t_{\sigma_i}).

Then the new rewards for subgraph i, accumulated since t_{\sigma_i}, are:

And the accumulated rewards for subgraph i (on the signal dimension) will be:

Now, when we want to compute the rewards for an allocation, we look at the new rewards since the last allocation change for the subgraph (which we’ll say happened at time t_{\omega_i}):

Note we snapshotted R_i(t_{\omega_i}^-) right before the last allocation change happened for the subgraph.

Then, we can compute the new rewards per allocated token:

(This only works because \omega_i(t) = \omega_i(t_{w_i}), given that t_{w_i} is the last time the allocations for i changed)

Which gives us the final value of rewards per allocated token on this subgraph:

Now, when we take the rewards for a particular allocation k \in A_i, that was created at time t^k, we can compute the rewards like this:

Note that \gamma_i(t^{k-}) had to be computed and snapshotted when the allocation was created (with the snapshot computed before the allocation is added to the pool).

When an allocation is closed, these rewards are minted and sent to the indexer (and delegators, if any).

All of these calculations are currently done inside the RewardsManager contract, especially in the `onSubgraphSignalUpdate`

and `onSubgraphAllocationUpdate`

functions that are triggered when curation signal or allocations change.

### Proposed calculation of rewards for L2

Once we move to L2, since we want to avoid minting on L2 (see the related section below for the rationale), we should decouple the rewards issuance and distribution, so we should mint on L1, send a fraction of the rewards to a Reservoir on each layer, then let RewardsManager use those already-minted tokens as needed.

So let’s define \lambda(t) as the proportion of total rewards that is sent to L2 at time t. In the contract code, this will be stored in the variable `l2RewardsFraction`

.

Let’s also define t_0 as the last time the rewards drip function was called (expected to happen at least once per week). This drip function will be described below.

\lambda is set by governance to a value between 0 and 1, and should be changed over time to incentivize the move to L2.

We can then define a global rewards function R(t), and total rewards functions for L1 and L2, R1(t) and R2(t) respectively:

Note R(t) = R1(t) + R2(t).

Besides this, we propose redefining the meaning of p(t). Now that we have an L2 deployment, it would be quite complex to sync the number of GRT burned in L2 back to L1. Moreover, now that we mint tokens in advance, the total supply at a certain time will represent the desired supply at a *future* time, because we’ve minted tokens that are to be distributed eventually, using a drip function as described below.

So our proposal is to define p(t) as: “the total supply of GRT produced by accumulating rewards up to time $t$”. So this definition would exclude burnt tokens and any tokens that were minted to cover future rewards; we can therefore compute it as follows:

Where we’ve initialized with the real GRT total supply at the time the Reservoir contract (described below) was deployed, and then we snapshot it at every new t_0' by adding the computed value of accumulated rewards.

This makes the issuance always follow the same exponential, so while issuance rate stays constant we could actually skip the snapshot and always use the same t_0. It is still convenient to do the snapshots on every drip, however, to prevent numerical errors when computing the exponential in fixed-point notation.

### Proposed drip function

As we mentioned above, the new L1Reservoir contract will have a a `drip()`

function. This function works as follows:

- The last time the function ran, it minted tokens for all rewards up to time t_1. This was saved in contract storage, together with t_0, p(t_0), R1(t_0). (So during initialization these variables will have to be set to their appropriate values when an initialization function was called, and the appropriate amount of GRT will have to be minted as well).
- We compute an error value \epsilon = \Delta R(t_1, t_0) - \Delta R(t, t_0) using the stored t_0, t_1, p(t_0). This is the number of tokens for
*future*rewards that have already been minted (if t < t_1, i.e. drip was called early) or the rewards for the interval [t1,t] that should’ve been minted, but weren’t (if t \geq t_1, i.e. drip was called late). Note \epsilon can be negative in this latter case. - Set and store t_0 = t, t_1 = t_0 + (7 \times 7200). This new t_1 is calculated to produce rewards for the next week (counting 7200 blocks per day as it will be post-Merge).
- Compute n = \Delta R(t_1, t_0) (using the updated t_0, p(t_0) and potentially updated r). This is the total rewards to be distributed from now up to the new t_1.
- Compute N = n - \epsilon. This is the amount of tokens that need to be minted to cover rewards up to t_1.
- Mint N tokens.
- Store p(t_0), R1(t_0).
- Send N \lambda(t_{0}) tokens to the L2Reservoir contract using the bridge, and also notify the L2Reservoir of the values q(t_0) = p(t_0) \lambda(t_0) and r. Note: if \lambda is updated, this needs to change a bit, to compute the correct difference based on the previous t_1 like we did to compute N above. See below for details.

This drip would then be such that, if all rewards at any particular time up to t_1 are computed based on the stored values for p, R1, R2, \lambda, the tokens available on the Reservoir on each layer should be sufficient to cover these rewards.

Calls to take rewards after t_1 may revert if there aren’t enough funds, in which case a call to `drip()`

is needed before closing the allocation.

When sending the tokens to L2. the L1Reservoir uses the L1GraphTokenGateway described in GIP-0031, and includes the necessary calldata so that the gateway on L2 calls `L2Reservoir.receiveDrip`

to update the necessary variables.

One more thing to note is updating \lambda will only take effect the next time `drip`

is called. Updating r (issuance rate) requires snapshotting rewards per signal, and it will also now require a call to `drip`

as well for it to take effect.

As mentioned before, when the value for \lambda changes between two drips, the behavior of `L1Reservoir.drip`

must be slightly modified to correctly compute the amount that should be sent to L2. Suppose the last time we called the drip, the value was \lambda_{old} and it’s now \lambda_{new}. Recall the value for \epsilon and n that we computed (where \epsilon can be positive or negative depending on whether t<t_1) then the value to send to L2 (let’s call it N_{L2}) is:

This corrects for the tokens that have already been sent to L2 with the previous \lambda, that should be effective until the current block. In the edge case where the new \lambda is lower than the old value, it’s possible that this subtraction produces an underflow and reverts; in this case, we would need to wait for some blocks (in the worst case, until t_1), so that \epsilon becomes small enough and the call can succeed.

It’s worth noting that the block number for t_0 stored on L2 might differ from the value on L1 because of the drift between block numbers on each layer, or because the retryable ticket wasn’t redeemed immediately. We can mitigate this in several ways, but we propose simply storing t_{0_{L2}} = block.number when the message is received in L2.

This means the amount for L2 rewards might not *exactly* match what’s needed to cover 1 full week if the time difference between the chains changes throughout the week. This means the drip function might have to be called slightly before the 7-day mark if funds on L2 are running low.

Additionally, we have to consider the risk that the drip transaction will fail on L2 for lack of gas, in which case it could be retried later, so we could have messages received out of order. To mitigate this, we propose including an incrementing nonce in the message sent from L1, and checking on L2 that the nonce has the expected value. A governance function can allow setting an arbitrary expected nonce on L2 to mitigate the edge case where a retryable ticket expires and is never received. Retryable tickets staying unredeemed for more than a few blocks would be undesirable, however, so it would be good for several actors to set monitoring tasks that redeem any drip tickets that have failed for insufficient L2 gas. A keeper reward described below, and the economic security from Indexer staking, should incentivize callers of `drip`

to redeem the tickets immediately.

### Updated rewards distribution

By defining our total rewards function like we did above, most of the rewards calculation and distribution can stay the same. The only thing that changes is how we calculate \rho (rewards per signal) at any point in time to distribute the rewards or when we compute a snapshot. Rather than computing it directly, we compute accumulated rewards for layer l at time t with the formula defined above:

i.e.

Where \Delta Rl is \Delta R1 on Layer 1 and \Delta R2 on L2:

And Rl(t_0) is the R1(t_0) or R2(t_0) stored in the Reservoir for each layer. Note that to compute R2(t) in L2, for gas efficiency we can keep q(t_0) = \lambda(t_0)p(t_0) as a single storage variable and compute:

Now on to how we compute the rewards for each subgraph and allocation:

We want to ensure that, for each layer l, and for every time interval [t_1, t_2] **in which signal is constant,** the rewards accumulated by each subgraph are the fair share of the total rewards:

Where \sigma_{T_{Ll}} is the total signal on layer l, so \sum_{i \in S_{Ll}}{\frac{\sigma_i}{\sigma_{T_{Ll}}}} = 1.

So the rewards per signal for this interval, where \sigma_{T_{Ll}} is constant, is:

Since this is only valid while signal is constant, we need to compute and keep the accumulated rewards per signal snapshotted whenever signal changes (like we currently do at each t_\sigma), and then we can compute the new value for it:

Note this also requires us to snapshot the total accumulated rewards on each layer (Rl(t_\sigma)) whenever signal changes on that layer, and it makes the rewards calculation a bit more expensive.

For completeness, the expanded formula for Layer 1 is:

And for Layer 2:

Now that we can compute rewards per signal at any given time, we can compute the delta for a subgraph i in layer l:

We simply have to use the correct formula for \rho on each layer, and as we currently do, compute and snapshot the value when the signal for a subgraph has changed.

After that, the algorithm for rewards distribution is identical to its current form as described above.

Considering how these values depend on information stored on the Reservoir, we suggest exposing the function for Rl(t) on the Reservoir (and L2Reservoir) and calling it from RewardsManager as needed.

### Burning denied/unclaimed rewards

Since the drip function will now mint all the potential rewards for an upcoming week, it’s possible that some of these will not be claimed or distributed, in which case they could accumulate in the Reservoir. To mitigate this, we propose burning the accumulated rewards for an allocation in three scenarios where we currently don’t mint them:

- When an allocation is closed with a zero POI
- When an allocation is closed late (after
`maxAllocationEpochs`

) by someone who is not the allocation’s indexer. - When the Subgraph Availability Oracle denies rewards for a subgraph

There is a fourth scenario where rewards may accumulate: when a subgraph is under the minimum signal threshold. In this case, we do not compute the rewards as accrued for the subgraph while the condition holds, and start accumulating afterwards. To keep this behavior, we propose not doing anything special here, so a small amount of GRT might still accumulate in the Reservoir. Future GIPs may find ways address this if it becomes a significant amount at any point.

### Keeper reward for calling the drip function

Any indexer wishing to close an allocation has an incentive to call the drip function so that rewards are available to be distributed. For this reason, we propose adding this drip call as an optional feature in the Indexer Agent. This call will, however, incur a potentially high gas fee, as it will include some computation on L1 plus a retryable ticket for L2.

Therefore, it would make sense to reward the caller of this function (the “keeper”) so that they can offset the cost of gas. We propose using additional GRT issuance to cover this reward, minted by the L1Reservoir but delivered to the keeper on L2 by the L2Reservoir, so that the reward is only given if the caller uses the correct parameters and redeems the retryable ticket in L2. Moreover, a fraction of the reward would be sent to whoever redeems the ticket in L2, to incentivize the use of correct L2 gas parameters.

The proposed formula for the keeper reward K at block t is:

Where t_0 is the last time `drip`

was called, t_{Kmin} is a minimum interval for calling `drip`

set by governance (`minDripInterval`

in the L1Reservoir), and \kappa is a constant set by governance (`dripRewardPerBlock`

in the L1Reservoir). The value of \kappa can be set to ensure that, as long as the price of gas in GRT stays within a certain range, it is always profitable to call this function before the week-long drip interval is over. This should be negligible when compared to GRT issuance from indexer rewards (or if it’s not, it’s probably a sign that L1 gas is high enough that it’s time to move the issuance to L2).

The Indexer Agent can therefore check whether a call to `drip`

would be profitable, and do it if that is the case. This call is vulnerable to MEV/frontrunning if sent in the open, so it would be preferrable to do it through a private auction channel like Flashbots, so as to not consume gas if someone else will run it before in the same block.

Calling `drip`

should be done with the correct parameters to ensure auto-redeeming of the retryable ticket, but since this is hard to guarantee, Indexers should try to redeem the ticket immediately if the auto-redeem fails. It should be a slashable offense to repeatedly and purposefully create tickets with incorrect parameters and not attempt to redeem them. It should also be a slashable offense to cancel a retryable ticket, since this will affect the whole network and require a corrective action from governance (i.e. manually adding funds to the L2Reservoir, and fixing the expected nonce).

(Continues…)