Ethereum client and JSON-PRC API for The Graph network

Hello everyone!

As for now, we are using OpenEthereum node for our indexer and starting to consider a second Ethereum node as a backup in case of failure. We also consider to use an external service for this.

From official documentation for indexers

Ethereum endpoint - An endpoint that exposes an Ethereum JSON-RPC API. This may take the form of a single Ethereum client or it could be a more complex setup that load balances across multiple. It’s important to be aware that certain subgraphs will require particular Ethereum client capabilities such as archive mode and the tracing API.

It states that we need Archive node with Tracing JSON-RCP API support.

Ethereum documentation - Nodes and clients | ethereum.org

Stand-alone clients

Client Status Archive Tracing CPU/RAM Disk Costs, $/m Comment
Geth Active :white_check_mark: :x: 4+/8GB 9.3 TB 1,246.60 c5.2xlarge/gp2 (9.750) - N.Virgingia
Openethereum Deprecated :white_check_mark: :white_check_mark: 4+/8GB 9.5 TB 1,510.35 m5a.xlarge/gp2 (12.012) - Ohio
Erigon Active :white_check_mark: :white_check_mark: 8/16GB 2 TB 467.16 m5a.xlarge/gp3 (3.020) - Ohio
Akula Alpha :white_check_mark: :white_check_mark: ? 2 TB ?
Nethermind Active :white_check_mark: :white_check_mark: 6/32GB 4.5+ TB ?
Besu Active :white_check_mark: :x: 4/16GB 3 TB ?
Parity Ethereum Deprecated - - - - -
Aleth Deprecated - - - - -

External services

Please share your experience, client you are using and maybe some additional information. Also, maybe you prefer to use an external service instead of hosting your own node - please share which one and why.

I will try to keep this post updated with all information.

Thank you!

2 Likes

Thanks for posting this!

The only alternative to OE, for the tracing that is used by graph-node, is Erigon. It has come a long way with Graph compatibility but we still have work to do in order to match OE’s behavior. Many Indexers have already made the transition from OE to Erigon, and a small subset of those Indexers, with assistance from the Erigon team and some Edge&Node devs, are actively helping improve Erigon’s tracing capabilities for Graph.

We have an Erigon working group led by @chris that runs on Thursdays, you can find the event on the info@thegraph.foundation calendar.

We self host the following for indexing:

Mainnet Indexer
2x OE archive nodes
1x Geth archive node
4x Erigon archive nodes (not used in production yet)

Testnet Indexer
1x Geth rinkeby archive node
1x Erigon rinkeby archive node
Mainnet nodes are used for indexing and the above nodes are used for the testnet subgraph and indexer-agent

Costs are all upfront, the most painful part of self-hosting is paying upfront for disk growth - to make sure you always have the storage capacity to support the growth of the larger nodes. This can be very expensive if you factor in disk redundancy.

3 Likes

@cryptovestor, thank you for the detailed reply!

We also started to use Erigon but got some issue with it when we fit the 2 TB DB limit. It is running as an application in the VM and now we are considering to run it using containers.

Before using external services we at least need some usage data and we probably can use some proxy to collect the stats.

For example, Anyblock states that they are ready for The Graph and I found their announce on the forum.

I have some question to setup you described:

  1. Why you are use Geth for The Graph?
  2. How are your running Erigon - standalone or containers?
  3. Did you consider to use external services and why?
1 Like

#1 Geth is (unofficially, I suppose) a reference client. I like to compare sync performance of OE against Geth for supported subgraphs (Geth cannot be used for trace features). I also like to use Geth with subgraphs that often have issues with OE. One example of this is the EIP721 subgraph which had some issues when syncing on OE (they might be fixed now, I don’t know, I still sync that one with Geth). So I have a graph-node instance that is only connected to this Geth instance and if I want to sync a subgraph with Geth I re-assign the subgraph to that graph-node.

#2 I have a mix of Erigon containers and VMs, I am moving to all containers using docker-compose in the near future in order to have a standardised execution environment. I have had some issues in the past with compiling Erigon myself, that are not an issue if I use their Docker images.

#3 I use Infura for mission-critical transaction execution. It just makes sense to use their very high investment in infrastructure for the most important transactions I execute, considering how cheap they are in the lower tiers. So this includes all transactions being executed by indexer-agent, mostly. I would never consider third parties for syncing subgraphs because it’s vastly cheaper for me to self-host a set of highly available and archive nodes and I have the skillset to manage the infrastructure.

On the question of usage data, I would be very keen to explore this in more depth. At a minimum, I would like to have self-hosted dashboards similar to the infura dash:

And ideally I would like to be able to break this data down by subgraph, so we can measure the activity generated by each one. Not sure if this is possible, but definitely something I would find very valuable for making cost and pricing decisions. If that sort of product existed, I would be happy to share the output usage data with the community and I’m sure other Indexers would too.

1 Like

An interesting and related topic
Testnet Docker Guide by StakeSquid / Archive node options/Service Providers (WIP)