Are contract calls *always* deterministic?

When calling a contract usign the try_contractAll construction:

export function getResult(address: Address): BigInt {
  let callResult = contract.try_getData(address);
  if (callResult.reverted) {
    return constants.BIGINT_ZERO;
  } else {
    return callResult.value;
  }
}

If there is a problem with the web3 node, is it possible for the result from this function to not be the same for each sync?

In the past, I’ve hit rate limit errors on the public infrastructure that implied after 10 retries it would time out, but I don’t know exactly what happens after that point- whether the call gets skipped if it reports callResult.reverted, or otherwise, but would this lead to inconsistencies between syncs?

EDIT: To be clear, I’m not asking if the result returned from a contract call is deterministic, I’m asking if the contract.try_function call is deterministic or if it can sometimes fail.

To follow up on this, and perhaps @jannis can give some context:

In our subgraphs we have found the following fatal error:

 "message": "Failed to process trigger: Failed to invoke handler 'handleBlock': Failed to call function \"state\" of contract \"GovernorAlpha\": ethereum node took too long to perform call\nwasm backtrace:\n  0: 0x1725 - <unknown>!~lib/@graphprotocol/graph-ts/chain/ethereum/ethereum.SmartContract#tryCall\n  1: 0x494c - <unknown>!generated/GovernorAlpha/GovernorAlpha/GovernorAlpha#try_state\n  2: 0x4b15 - <unknown>!src/utils/ProposalStatus/handleBlock\n"

In this case the failure is that the call took too long to process, but I don’t understand what the other possible states of this error could be.

Generally, contract calls are retried in case of unexpected errors, such as networking issues, HTTP error codes being returned etc. There are also expected errors, which are not retried. These include things like ABI mismatches / decoding errors or regular call reverts/exceptions caused by the EVM.

This behavior should ensure deterministic indexing results.

However, it looks like timeouts are currently treated as a fatal subgraph error.

Being rate limited, like you mention, should result in a 429 HTTP status, not a timeout. Such requests would be retried. But we can’t be sure that all Ethereum nodes/providers are compliant with this, and may cause timeouts instead. Besides, networking errors can also lead to timeouts and should be retried.

I’d like to double check this with @Leo and @lutter—there may be a reason for the current behavior—but I think graph-node should retry on timeouts as well, instead of failing the subgraph.

The output of a try_ call is deterministic if it is returned to the mapping. But it may cause non-deterministic subgraph failures such as in the case of a timeout, then the subgraph would attempt resume syncing once the Graph Node is reset. Rate limiting errors on eth_call are currently retried forever.

@Leo - Thanks for the info! IT’s good to know rate limiting errors are tried forever.

I think it would be nice/useful if network failures were still tried forever. It seems a bit drastic that a network failure, in this case, a failure for the node to respond, would kill the subgraph. The cost of the subgraph going down is however long the indexing period is, so in my personal example with some subgraphs that take weeks, an unexpected network error might set me back a month. (Unless the resync is faster, which I honestly haven’t seen make a substantial difference)

A Graph Node timeout on eth_call isn’t a very likely event since the timeout for EVM execution is typically much shorter and in that case we already retry. But generally we might not safeguard against network failure in all places so it might cause subgraphs to fail. I’d like to eventually have us retry all non-deterministic failures as a solution to this.

But if it happens you don’t need to resync, the subgraph will attempt to restart if the Graph Node is restarted.

Ah, so the subgraph needs to be manually restarted, I’m actually not familiar how to do this. I assume that I would need to simply redeploy. In our architecture however that might not work as we are generating our subgraphs prior to deployment and the redeploy might include minor (even immaterial) changes that would alter the IPFS hash (since we don’t store the individual subgraph, we generate and deploy on the spot).

I’m not sure that requiring a graph-node restart is a good idea anymore. Is there any reason why we couldn’t retry on timeouts as well?

Sure, why not, opened a PR. But my point is that we should have a higher-level retry for any error that is not known to be deterministic.

Approved, thanks @Leo! :+1: