GIP-0022: Tendermint indexing for Graph node


Abstract


Motivation


Prior Art


High Level Description


Detailed Description

Data extraction from Tendermint node

The Tendermint library is a base foundation for many networks (including the ones based on the cosmos-sdk). One of its packages provides a basic internal indexing capability. The Indexer package gets the data from a synchronous event bus. This bus is also used in multiple components of a network node to communicate internally and trigger actions based on the node’s events. This package architecture will be leveraged to extract information from the process.

The modified version of a Tendermint package would be injected into the node itself, extending its indexing capabilities. The process will subscribe to the synchronous event bus, retrieve next events (like new block, new transaction, new evidence) and persist it into external storage (like filesystem). Operating on the event bus of the Tendermint library, allows to create an indexing process that is truly network agnostic.

To achieve that, the only change needed to be done to the network’s node is to extend its code replacing the original version of the library, with the augmented one. This will be done in go.mod file using a simple replace on relative versions:

replace github.com/tendermint/tendermint => github.com/figment-networks/tendermint v0.34.9-augmented

This modification has to be done for all the crucial (and latest) versions of the node for every targeted network. To support all the type changes in the original network data structure throughout the history of the network, an external repository with protobuf types was created. This structure will always be consistent with the Graph Node code, regardless of any potential changes in the Tendermint library. The node will map its internal types onto respective external ones. The external types should match original Tendermint types as close as possible, to give the subgraph author a seamless experience.

Firehose data

Further in the process, the nodes output would be passed into dedicated Firehose stack instance to pass it into the graph node. Similarly to other Firehose integrations, a series of consecutive height-related rows is grouped into one structure, creating the “everything that happened at one height” protobuf structure called the EventList:

message EventList {
  EventBlock newblock = 1;
  repeated EventTx transaction = 2;
  EventValidatorSetUpdates validatorsetupdates = 3;
}

This structure itself is later wrapped in the bstream.Block structure to be processed. And eventually passed into the Graph Node including the height and hash of the corresponding block.

Network data in Tendermint structures - cosmos-sdk example

Structures that are extracted from Tendermint are the core events of the Tendermint event bus (https://github.com/tendermint/tendermint/blob/v0.35.0/types/events.go).

In every Tendermint-based network event bus carries base structures like block or transaction. Most of the fields of that structures are strongly typed, however - not all. The network dependent data is passed as encoded (protobuf) byteslices.

The structure we’d be mostly focused on will be:

// Data contains the set of transactions included in the block
type Data struct {
	Txs [][]byte `protobuf:"bytes,1,rep,name=txs,proto3" json:"txs,omitempty"`
}

The byteslices in that slice are protobuf encoded transactions supplied by higher libraries.

Using the example of the cosmos-sdk (https://github.com/cosmos/cosmos-sdk/blob/master/proto/cosmos/tx/v1beta1/tx.proto#L14):

message Tx {
  // body is the processable content of the transaction
  TxBody body = 1;

  // auth_info is the authorization related content of the transaction,
  // specifically signers, signer modes and fee
  AuthInfo auth_info = 2;

  // signatures is a list of signatures that matches the length and order of
  // AuthInfo's signer_infos to allow connecting signature meta information like
  // public key and signing mode by position.
  repeated bytes signatures = 3;
}

In particular, the cosmos-sdk implemenation has additional transaction granularity, inside the TxBody you will find:

message TxBody {
  // messages is a list of messages to be executed. The required signers of
  // those messages define the number and order of elements in AuthInfo's
  // signer_infos and Tx's signatures. Each required signer address is added to
  // the list only the first time it occurs.
  // By convention, the first required signer (usually from the first message)
  // is referred to as the primary signer and pays the fee for the whole
  // transaction.
  repeated google.protobuf.Any messages = 1;

  // memo is any arbitrary note/comment to be added to the transaction.
  // WARNING: in clients, any publicly exposed text should not be called memo,
  // but should be called `note` instead (see https://github.com/cosmos/cosmos-sdk/issues/9122).
  string memo = 2;

  // timeout is the block height after which this transaction will not
  // be processed by the chain
  uint64 timeout_height = 3;

  // extension_options are arbitrary options that can be added by chains
  // when the default options are not sufficient. If any of these are present
  // and can't be handled, the transaction will be rejected
  repeated google.protobuf.Any extension_options = 1023;

  // extension_options are arbitrary options that can be added by chains
  // when the default options are not sufficient. If any of these are present
  // and can't be handled, they will be ignored
  repeated google.protobuf.Any non_critical_extension_options = 2047;
}

Using the example above, the fields: messages , extension_options and non_critical_extension_options are lists of Google’s Any type (https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto)

message Any {
  string type_url = 1;
  bytes value = 2;
}

This effectively means that inside, you should expect a pair of typename and bytearray.

Example #1:

Cosmos’ bank module (https://github.com/cosmos/cosmos-sdk/tree/master/x/bank) that is used to send transactions would have type_url: cosmos.bank.v1beta1.MsgSend and the value of (https://github.com/cosmos/cosmos-`sdk/blob/master/proto/cosmos/bank/v1beta1/tx.proto#L21):

message MsgSend {
  option (gogoproto.equal)           = false;
  option (gogoproto.goproto_getters) = false;

  string   from_address                    = 1 [(cosmos_proto.scalar) = "cosmos.AddressString"];
  string   to_address                      = 2 [(cosmos_proto.scalar) = "cosmos.AddressString"];
  repeated cosmos.base.v1beta1.Coin amount = 3
      [(gogoproto.nullable) = false, (gogoproto.castrepeated) = "github.com/cosmos/cosmos-sdk/types.Coins"];
}

Example #2:

Terra’s wasm module (https://github.com/terra-money/core/tree/main/x/wasm) that is used for wasm smart contract interaction, for the execution would have type_url: terra.wasm.v1beta1.MsgExecuteContract and the value of (https://github.com/terra-money/core/blob/main/proto/terra/wasm/v1beta1/tx.proto#L95):

message MsgExecuteContract {
  option (gogoproto.equal)           = false;
  option (gogoproto.goproto_getters) = false;

  // Sender is the that actor that signed the messages
  string sender = 1 [(gogoproto.moretags) = "yaml:\"sender\""];
  // Contract is the address of the smart contract
  string contract = 2 [(gogoproto.moretags) = "yaml:\"contract\""];
  // ExecuteMsg json encoded message to be passed to the contract
  bytes execute_msg = 3
      [(gogoproto.moretags) = "yaml:\"execute_msg\"", (gogoproto.casttype) = "encoding/json.RawMessage"];
  // Coins that are transferred to the contract on execution
  repeated cosmos.base.v1beta1.Coin coins = 5 [
    (gogoproto.moretags)     = "yaml:\"coins\"",
    (gogoproto.nullable)     = false,
    (gogoproto.castrepeated) = "github.com/cosmos/cosmos-sdk/types.Coins"
  ];
}

Example #3:

Osmosis’ gamm (Generalized Automated Market Maker ) module (https://github.com/osmosis-labs/osmosis/blob/main/x/gamm) message for joining the pool would have type_url: osmosis.gamm.v1beta1.MsgJoinPool and the value of (https://github.com/osmosis-labs/osmosis/blob/main/proto/osmosis/gamm/v1beta1/tx.proto#L47):

message MsgJoinPool {
  string sender = 1 [ (gogoproto.moretags) = "yaml:\"sender\"" ];
  uint64 poolId = 2 [ (gogoproto.moretags) = "yaml:\"pool_id\"" ];
  string shareOutAmount = 3 [
    (gogoproto.customtype) = "github.com/cosmos/cosmos-sdk/types.Int",
    (gogoproto.moretags) = "yaml:\"pool_amount_out\"",
    (gogoproto.nullable) = false
  ];
  repeated cosmos.base.v1beta1.Coin tokenInMaxs = 4 [
    (gogoproto.moretags) = "yaml:\"token_in_max_amounts\"",
    (gogoproto.nullable) = false
  ];
}

The relation between transaction types and data listed above (proto) shows great similarity to the relation between Ethereum logs and smart contract events (abi). This proves that this type of integration would give subgraph authors the most similar experience to the current Ethereum-stack development process. At the same time it would preserve the Tendermint types for people coming from Cosmos ecosystem.

Because all this data is heavily network based, it should be decoded inside the subgraph code using and subgraph author’s type definitions, and not before.

Graph Node Tendermint integration

Eventually the Tendermint Firehose gRPC endpoint will be consumed by Graph Node, which will implement a dedicated FirehoseBlockStream, Tendermint specific types, and filtering for triggers, ultimately reaching subgraph code.

Based on the previous section, to properly process network data subgraph runtime environment has to be extended with a way to decode protobuf bytes encoded information inside Tendermint payload. A similar solution already exists within the subgraph API for JSON data parsing (https://thegraph.com/docs/developer/assemblyscript-api#json-api).

Subgraph authors would need to add a set of protobuf files to their subgraph projects, based on their needs. Then, using the command line tool they should generate typescript bindings for the specific protobuf type. The generated code should include methods to properly decode structure.

my_type.fromBytes(data: Bytes): MyType
my_type.try_fromBytes(data: Bytes): Result<MyType, boolean>
my_type.fromString(data: String): MyType
my_type.try_fromString(data: String): Result<MyType, boolean>

Inside the runtime those functions should trigger rust host bindings and using prost (or different version of the protobuf library) decode data into the desirable structures based on the protobuf definitions attached.

Optionally, the subgraph runtime environment may also be augmented with some form of declarative ‘pre-decoding’ to enable more granular data filtering. This has to be done by extending the Subgraph Manifest with the ability to tell (similarly to abi) which field in structure is what type or in case of types.Any a set of types. This would allow subgraph authors to refer to that decoded field in filters, and subscribe in handlers for specific types of messages.

Operational requirements

In order to index Tendermint subgraphs, Indexers will therefore need to run the following:

  • A Tendermint-based network node with augmented library
  • Tendermint Firehose Component(s)
  • A Graph Node with Firehose endpoint configured

GraphQL definition

No changes are anticipated to the schema.graphQL

Areas for further development:

  • The initial Tendermint implementation targets base data, sent mostly in the block trigger and some event triggers from the EventList base structures (events). In further development this should be extended to include some more helpful triggers for subgraph authors.

Backwards Compatibility


Dependencies


Rationale and Alternatives


Copyright Waiver

Copyright and related rights waived via CC0.

12 Likes

Thrilled to see this progressing @Joseph thank you to you and the Figment team for your work on this.

One question on the detail, from an Indexer perspective - are the modifications to the Tendermint node library constantly changing/in development? Is there scope for pushing the required features upstream? This is similar to a question I raised around pushing Firehose instrumentation into Geth upstream.

I imagine the answer is really reliant on whether these methods for indexing become the defacto standard or not? From an Indexer’s point of view, it makes planning and tracking software versions across many nodes much easier.

2 Likes

@cryptovestor I can give you some more insight on this.
As the roadmap we’ll cover more networks using data extraction approach. In the case of tedermint-based ecosystem, it happens to have it’s internal event bus from what we can take the data. This is why this particular way of network integration was picked.
This is why it’s also different kind of a network extraction, we don’t print statements into stdout, instead we wrote augmentation to a one place in the node to return the data.
This is why any new version of tendermint should not impact this particular code. And yes it is still in develpoment, it’s in it’s final phases but still developed :slight_smile:

2 Likes

Great stuff @Joseph & Figment!

This suggests that instead of there being multiple handlers, differently triggered in specific cases, there will be one large handler which decodes the data and decides on the appropriate handlers accordingly?

It would be great to get a bit more detail on “filtering for triggers” - what triggers might be interesting for subgraph authors? Which are common to all chains using Tendermint, and are there more specific triggers which might be useful in certain contexts?

This is quite heavy overhead, particularly if all subgraph authors for a given chain will need to do the same generation of bindings. Do you anticipate canonical libraries being created to help subgraph developers creating subgraphs on the same chain? Such libraries could reduce the need for individual developers to provide protobuf files.

This sounds like a desirable pattern, particularly in order to facilitate more efficient filtering and more specific handlers.

Some other questions:

  • I understand that Tendermint offers finality, does that mean that chain re-orgs are not a consideration for Tendermint subgraphs?
  • What kind of throughput do Tendermint chains require (in terms of seconds per block, triggers per block)?
  • Do you anticipate any challenges with protocol versioning going forwards, either at the level of Tendermint, or the chains?
  • Do you have specific example subgraphs in mind for indexing?
  • What changes do you anticipate to the subgraph.yaml spec?
  • Would you need to run a Tendermint-based network node for each network? Do you have any concerns about the operational requirements for indexers?
  • How do you plan to roll out support for different Tendermint-based networks?

Thank you @cryptovestor and @adamfuller. We also want to thank the other core devs for initial feedbacks and answers about the firehose stack.

What’s mentioned above only concerns transaction handlers, other handlers can be implemented in the standard way.

Sure. Right now we support block and event triggers that are common to all chains using Tendermint. We might want to add some additional triggers and expand that section in our GIP. A trigger that could be interesting is transaction triggers.

That’s a good point. And yes we were thinking of creating libraries with the protobuf files for different networks and subgraph authors would just import them in their code.

Yes, chain reorgs (in regards to block rollbacks, finality) are not a concern for Tendermint-based chains.

In terms of general throughput on Cosmos/Terra, it has been around 6s per block. Not sure at this point about triggers. We will keep it under our radar. Thanks for surfacing it up.

Yes, for example right now we are debugging issues with decoding assembler u64 and i32 types.

There will be a new kind of data source tendermint for subgraphs.

Yes, each Tendermint-based network requires to run a node. General recommendation is to run at least 2 nodes per network for redundancy. As for concerns, the major issue with running tendermint-based network is data backfill, ie it will take a long time to index historical data using instrumented nodes.

Each tendermint-based network would need to have own forked node repository with extractor and we would like to provide public binaries for networks we develop on

1 Like

There is no concrete scope for upstream integration (in regards to TM) right now,
but since we are working with StreamingFast on instrumentation efforts, this topic
came up for a discussion and there are definitely a lot of good ideas floating around.
Things like example blockchain integration, an SDK, etc. Each node instrumentation implementation
has its own edge cases and quirks, so if we decide to provide a common framework for
node instrumentation we should spend a fair amount of time on research first.

1 Like

OK - what types of handlers will be supported? Transaction, Block, Event?

Do all Tendermint chains use a common event pattern for their on-chain events? Would be great if so!

Sounds good.

Makes sense - was also thinking about support on the Hosted Service & testing with initial users?

Looking forward to indexing Tendermint!

Hey all! I’ve done quite a lot of work on indexing Cosmos chains and would love to chime in here with some thoughts:

I’ve built some working IBC indexing for pulling relayer related data and also have quite a bit of experience in the websocket and event streaming world of indexing. I’ve also built a number of tax data indexing programs over the years.

I’m currently encapsulating my best practices go client dev work in a lib called lens which will have a generic indexing functionality that will enable developers to build their own custom indexers.

Happy to help here!

4 Likes

Welcome to the Graph Community! Thrilled to see you here Jack, followed your work since Cosmos launch.

1 Like

On the Tendermint level it’s Block and Event. To decode transaction data we need to know the transaction type which is network-dependent.

That’s right! We can use the same mechanism for all Tendermint networks.

Welcome @jackzampolin to The Graph Community!

1 Like