Unit Testing in Assembly Script

Hi! This is Dennison from WithTally.com, we’re building on top of the Graph using Subgraphs to power deep, rich, data about governance ecosystems and their accompanying token ecosystems.

Problem:

Our subgraphs are quite large and take many days to sync, but provide us with a fantastic insight into what is happening inside governance ecosystems.

The problem is however because our subgraphs go beyond the simple “indexing events” use case, there are plenty of opportunities to introduce bugs and we frequently find ourselves stuck in the rather long development cycle of developing, deploy, sync, test.

We are sensitive to the accuracy of data we have developed an internal tool to monitor our subgraphs in production and inform us if we spot any anomalies in the live data.

However, this solution is not perfect and introduces its own problems because we need to compare the methods we used to aggregate data in subgraphs to a different formula to aggregate data in the production monitoring when in both cases the source of truth is actually the blockchain: not either “view” of the data we generate.

At its core the problem is this:

The blockchain is the source of truth, a subgraph is a “view” on this source of truth which applications downstream interpret as a source of truth.

Verifying that the subgraphs “view” generated by the mappings is correct is problematic because it requires a comparison tool which also needs to generate a “view” on the underlying source of truth (the blockchain) to test.

Unfortunately, The Graph ecosystem (and assembly script at large) lacks a comprehensive unit testing tool. This means the test cycle is currently either very cumbersome (creating entire demo ecosystems- local subgraph nodes, local smart contract deployments, etc…) or it is exceptionally long- deploying to production, waiting (in our case DAYS) before a bug becomes evident for a new deployment, making a fix, and then proceeding to resync.

Solution

What is needed is the ability to test mappings in isolation and in concert to understand, prior to deployment, if they are doing what we think they are doing, and to test them against deterministic data sources/events.

This process is not crazy to do with assembly script, however, in the case of The Graph, it is necessary to create a sandbox environment to mock The Graph node itself. It’s not as simple a process of compiling the WASM modules and including them in Javascript to use with something like Mocha & Chai, but it’s not so far off.

We already have an idea of how this would be done and have been speaking with another team that would work with us on the technical capacity to tackle this. We’ve also spoken with a number of other individuals in the space who also feel that Unit Testing is key for building a more robust Graph Ecosystem.

Conclusion

Unit testing is necessary for The Graph to be the solution for teams in production with mission-critical subgraphs. We would be happy to spearhead a grant proposal but we also recognize this is a project that would probably involve a larger number of participants. Would be interested to see what others think.

15 Likes

This is sorely needed for subgraph developers! Would be fantastic to see the tooling and workflows around subgraph development improved :clap:

7 Likes

Thanks for kicking this off Dennison!

I just had a great call with Sebastian from Enzyme (@fubhy). We spec’d out what a unit testing framework could look like. We believe there are two important parts to this: the unit testing framework itself, and a scaffolding cli that makes it easy to replicate edge cases and live data. Below are our thoughts, would love feedback from anyone.

Unit Testing Framework

The unit testing framework allows developers to test their mapping logic against a known store state and with precooked test fixtures (events). Assertions can be made using a snapshot-style approach.

User Stories

As a user I want to hydrate the store with a certain state

Users must be able to hydrate the store with a known state of entities. Ideally both:

  • Hydrate state programmatically using Entity objects
  • Hydrate state using a JSON blob

As a user I want to call a mapping function with an event

The user must be able to create an event Entity and pass it to a mapping function that is bound to the hydrated store.

As a user I want to call all of the mappings with event fixtures

The user must be able to call the mappings with test fixtures.

  • Fixtures could be set up programmatically as a list of Events
  • Fixtures could be loaded from a JSON blob

As a user I want to assert the state of the store

The user should be able to assert the final state of the store in a snapshot-style approach (compare output against expected).

  • Must be able to assert entities
  • Possibly assert entire state of store to assert non-existence as well

Scaffolding CLI

It would be really nice to be able to generate hydration and fixture data from mainnet, testnets, or a local node. This could be added-on to the above framework. This would make it easy to generate test data from local nodes or testnets.

User stories

As a user I want to generate hydration and fixture data between two blocks from a node

The user would be able to generate a JSON blob of store state for the starting block, and generate fixtures for the subsequent block up until the final block. This could be run against any node; whether local, testnet, or mainnet.

2 Likes

Fantastic! This pretty much hits our wish-list as well.

A couple other items that we would put on the User stories are:

Assert State of the Subgraph
This is similar to the state of the store, but specifically for catching what conditions cause the subgraph itself to fail.

Performance Benchmarking
Some architecture decisions are dramatically less performant than others, and can even affect the possibility of a subgraph ever reaching the head of the chain. It would be great if we could benchmark mappings to optimize.

In Production Assertions
This might fall out of scope, and perhaps a good testing setup should prevent this, but if there is some invariant we could assert true and cause the subgraph to fail if false, it would be helpful to prevent silent errors where data is corrupt.

2 Likes

I really like the state of the subgraph idea- do you mean to assert a failure code? You were mentioning that on Twitter.

I feel like true performance benchmarking would require the whole system setup. I imagine the unit tests would just run the mapping code. We could time the test runs and log it, perhaps?

1 Like

I’d guess it would be possible to get a rough estimate of performance “cost” of a mapping by statically analyzing the mapping code. Much like query complexity is evaluated in a graphql query.

It’s much harder to infer the “actual” performance impact of certain scenarios like e.g. contract calls because, depending on the complexity of the underlying contract code, a call can either be fast or slow with different inputs (and onchain state).

In general, at least in our observation, there are multiple “tiers” of performance impacts…

#1 Contract calls (by far the slowest)

#2 Store writes (including deletes)
#3 Spawning data source templates
#4 Store reads
#5 Logging

With the exception of unbounded for-loops or while loops (obviously), arbitrary logic, including finite loops and arithmetic operations, etc. can be ignored (no measurable performance impact) in our experience.

There are a couple of things with loops, obviously, that can negatively impact performance massively. E.g. making contract calls / loops with large arrays. It’s hard to detect those specifically but it would be possible to also find those in a static analysis of the code and at least “flag” them.

A simple tool to statically analyse the code by traversing the AST and assigning scores (with multipliers for loops, etc.) and higher scores according to the “tiers” I mentioned would be super useful imho.

3 Likes

A linter for subgraphs! Brilliant. A linter would help all users be aware of the performance hit of their code.

We had several instances where the values were available in the event, but the developer decided to pull the data from the smart contract. Knowing the cost, even ballpark, of small decisions like that would be very helpful.

That being said, this feels like it’s outside the scope of a unit testing framework. A linter would be a project in itself, with scope that could include extensions for major editors like VSCode.

I also want to mention that @fubhy has some great ideas around end-to-end testing locally, but we thought that should be a separate proposal.

We need a backlog! So many great ideas. So far we have:

  1. Unit testing framework
  2. Subgraph Linter
  3. End-to-end Test Framework
  4. Subgraph invariant system (might need some tooling from the Graph! Needs a GIP perhaps?)

I’d like to propose that we keep this grant focused on the unit testing, and we can tackle the other projects separately. Let’s keep the scope super tight to ensure a swift delivery.

I think it would be ideal if we collaborated on the scope of work for the grant proposal. I’ve created a github repository here:

Grant Proposal Github Repo

Ideally we can collaborate there, then present the final scope of work once we’ve reached consensus.

@dennison I’ve added the subgraph failure assertion & timing req’mnt to the above project.

Does that work for you all? Let’s keep throwing ideas down! When we’re happy we can lock down the scope.

3 Likes

Yeah works for me, I agree we should keep the scope pretty narrow so that we can move forward quickly.

1 Like

@dennison @fubhy How about we schedule a call next week to finalize what we’re thinking?

We can:

  • Finalize requirements (+ any non-functional)
  • Estimate the scope of work
  • Establish a budget

We’re a combo of PST, EST and CET timezones. So perhaps we can meet early my time? 9:00am PST or so.

1 Like

Yes that works for me. Preferably Tuesday or Thursday but Wednesday would work too if need be.

Works for me, Right now i could do a bit later- after 3pm EST on Tuesday. Wednesday is a bit busier for me though. Also could do thursday/friday 4:30pm EST onwards.

I love watching knowledgeable people talk about what they’re good at.

1 Like

Another request:

State of Contract Calls

I know that contract calls are not considered best practice, but not all contracts are written such that events give a complete picture of the smart contract state.

It would be interesting if we could mock contract calls to see how our subgraphs behave on reverted or failed calls.

1 Like

Ian from Uniswap here. Just wanted to jump in and share that we’d greatly benefit from a testing solution like this.

The Uniswap v2 subgraph takes over 30 days to sync now. Yes you can read that again its true.

Its very difficult to be confident in new versions of the mappings without isolated testing.

4 Likes

@dennison Super interesting point; we’ll need fixtures for not only events but contract calls. I’ve added that to the spec.

@ianlapham A 30 day sync time is insane. That’s like Dennison’s WETH subgraph. The latest iteration of the spec is on Github here. It would be great to have your input so that the spec will meet your needs as well. Please comment or open a Github PR so that we have your input!

Finding a time to meet in person might be challenging, as Dennison is available later and Sebastian is on CET. How flexible are you guys? The earlier the better Dennison, and the later the better Sebastian.

1 Like

We found a time to meet this Friday at 11am PST. If you @ianlapham or anyone else would like to join, let me know and I’ll shoot you an invite.

3 Likes

Great - that should work for me. Thanks for finding that time - my email is ian@uniswap.org

3 Likes

Hi everyone! This is Seba from Protofire. It has been about two years of building 20+ subgraphs for our team and as the first member of that team I would like to join to the discussion to put my 2 cents on this topic.

We have also faced challenges from both small subgraphs with high traffic, as well as large subgraphs that take weeks to index the last block, in which we experienced the need of having a more strong testing environment and a granular way to isolate certain scenarios to reproduce a bug or to avoid waiting a few hours/days of indexing to test a fix or validate an improvement. We cannot agree more that unit testing is an increasingly imperative need to provide high-quality subgraphs.

I also wanted to mention that we have been exploring the idea of developing a linter for subgraphs
which also seems like a brilliant idea to leverage the quality of community-developed subgraphs. We wanna capitalize the experience gained during this time developing subgraphs (in which we identify some patterns) and the development of Solhint.

Our team will carefully analyze the proposals made in this thread and get back to you with feedback.

5 Likes

Great call today everyone! I’m really pleased to see people come together like this.

Our big decisions today were:

  1. Reducing the scope of work to a pure, programmatic unit testing framework
  2. Estimating the dev work and budget for the project.

The Scope of Work has been updated.

We realized that there are many things we want to build, but we should start simple so that it can be shipped quickly. The learning from this tool will inform the next iteration.

@sistemico It would be great to have your input!

Next steps are to find a suitable team to build it.

2 Likes

I want to follow up with the latest news: Limechain is going to build it! It’ll be great to have a professional dev shop work on the project.

4 Likes