Best practices for web apps interacting with subgraphs

jjperezaguinaga · August 3, 2021, 3:55pm

Hi! Over the past few days, I’ve been supporting ETH Poster, a general-purpose social media based on a ridiculously simple smart contract. We have a subgraph deployed that fetches the smart contract events (located in 0xA0c7A49916Ce3ed7dd15871550212fcc7079AD61 in görli) and displays them in the site as “posts”. By building the app I realized I’m missing some best practices that I would like to either learn or help define on how to consume subgraphs in the future.

A web app that submits data to a smart contract that gets data immediately updated via calling a subgraph might face the following issues: backpressure (1), data gets updated so fast that when queried might hide the data needed; refresh cost (2), data needs to be queried often, and without subscriptions on only new data updated, an entirely new query needs to be sent for a minimal delta update; metadata caching (3), some on-chain data creates logic needs to be cached, and there are no documentation on best ways to do so.

In short, I’m currently facing the following challenges:

Backpressure. At the moment, the “feed” of all the posts are being “firehosed” to the smart contract (and thus the subgraph), and although a query with where: { user: 0x111... } can be done to filter all the contents per user, doing a global query with first: n might hide some posts where n-m, given m could be any arbitrary amount. In short, posts could be hidden and never been able to be queried because newer posts would arrive.
Refresh costs. Currently, it’s not very clear from the docs how subscriptions or updates to a subgraph data can be queried. Right not, my best option to update the “feed” is to re-query the whole graph every X blocks and try to show the delta of new and old posts. The only cost using the legacy explorer is bandwidth, but moving forward would cost GRT.
Cache layer. Not that much related to the subgraph, but there’s no general preference or suggestions from TheGraph on how to cache some of this information or what good practices could be used to build off-chain data. Some projects like Ceramic Network are working on these topics, but I’m curious about what other projects are doing.

As I’m sure some of these problems had already been solved by other applications or community members and projects, I’m looking forward to input and other best practices. Hopefully, we can put together a guide or a demo DApp that’s not a simple data fetch query but something that can show very didactically how to use subgraphs in production. If you know already an open-source app that does this, please share it here, thanks!

adamfuller · August 3, 2021, 11:19pm

Hey thanks for Posting (pun intended?)

So I think a key thing that is missing from your subgraph to help you with your use-case is a bit more metadata about when the event was seen on-chain. You can use the Ethereum API in AssemblyScript to access the event.block.timestamp or event.block.number, which could be saved on your Post entity.

This, combined with pagination (documentation) should then let you iterate through the Posts based on when they were posted, from most recent to oldest, to ensure you can fetch n-m.

This should also help with the refresh costs - currently polling is the recommended approach, but if you keep track (in your application) of the max post timestamp from an initial query, you could then just fetch any more recent posts with something like:

{
  query newPosts($max_seen_timestamp: Int) {
    posts(first: 1000, where: { postedat_gt: $max_seen_timestamp  }) {
      id
      poster
      type
      content
    }
  }
}

This should hopefully remove a requirement for requerying the whole graph and finding a delta.

I am not sure what you mean by 3. - subgraphs are designed to be a cached, off-chain data-source for on-chain data. Are you maybe referring to data that is not available on-chain?

Hope that is helpful, do let me know if I have misunderstood your challenges!

schmidsi · August 4, 2021, 1:42pm

Ah, very cool project! Simplicity is so beautiful

I agree with Adam for 1: I would argue that you should add the block number to the post. In this subgraph this is already done: subgraph/schema.graphql at main · ETHPoster/subgraph · GitHub. That also gives you the possibility to filter/sort by the block number.

Which brings me to 2:
The Graph does not support subscriptions at the moment, in order to not over-query your subgraph you can watch on your JSON-RPC provider for new blocks and only query The Graph if there is a new block and, like Adam showed above, only query for the delta. That said, there is still the possibility where you query for new data and there is no new data so you “wasted” a query. You can decrease the query interval to every X blocks to mitigate this problem a bit until we implemented the subscriptions properly. I would not think about query fees too much since they are cheap. Especially for simple subgraphs like this.

jjperezaguinaga · August 4, 2021, 10:28pm

Thanks everyone for replying! I agree that the block track is the best way to approach this, which can progressively be requested after X time. To do so, I could calculate the amount of blocks that have passed based on the average time a specific network takes to fetch new data and then produce a new query based in this information.

I definitely could listen to a JSON-RPC but I want to avoid shipping one into the app altogether. Sadly right now having a JSON-RPC provider in a DApp means,

Providers are able to track your usage of their app via your IP
DApp creators need to foot the bill of the provider used
The application could go entirely down if Infura/Alchemy et al go down
Does not fully take advantage of the reliance of the graph ecosystem

In regards to the cache question, it goes the same way, I’ll store content on localStorage in the client and then only store the missing delta based on latest block and timestamp.

jjperezaguinaga · August 7, 2021, 11:56pm

Following up here, thanks to @adamfuller and @schmidsi 's advice I implemented the following changes:

Using block numbers within my subgraphs entities
Manually poll data as I “see it” on the blockchain

Before I jump there, I would strongly recommend any app developer querying a subgraph via react’s popular GraphQL client apollo, to leverage their Lazy Loading capabilities, instead of using the default “query on load”. This allows more control on queries to use the heuristics defined in this post. This looks something like this:

const [getPosts, { loading, error, data }] = useLazyQuery(GETALLPOSTS, {
    fetchPolicy: "network-only"
  })

The fetchPolicy is important as it tells the client to fetch the data independently of the cache, which in this particular case, considering subscriptions are not available and we are fetching data manually, it would work for us. To control this, in react you can use the useEffect hook which can listen to a particular state change passed to the component displaying the subgraph data:

useEffect(() => {
    const loadPosts = () => {
      getPosts()
      dispatch({
        type: 'SET_SUBGRAPH_GETALLPOSTS_RELOAD',
        needsToReloadGetAllPosts: false,
      })
    }
    loadPosts()
  }, [getAllPostsNeedsReload])

In this case, we are using getAllPostsNeedsReload as a flag that tells the component to manually reload the subgraph query (getPosts in our case). To avoid race conditioning the indexer block sync, I’ve wrapped the state change in a reducer that waits for a fixed amount of time:

setTimeout(() => {
      dispatch({
        type: 'SET_SUBGRAPH_GETALLPOSTS_RELOAD',
        needsToReloadGetAllPosts: true,
      })
      dispatch({
        type: 'SET_SUBGRAPH_RELOAD_INTERVAL_LOADING',
        isReloadIntervalLoading: false
      })
    }, SUBGRAPH_RELOADING_TIME_IN_MS)

This state change gets triggered by a manual change done by the user (in this case a new post), so we know for sure that we need to do a query. The next step will be,

Obtain the latest blockNumber from the last post, and only query blockNumber + n posts after posting it to avoid querying everything beforehand (since we have it already).
Add a JSON-RPC wss provider I can listen events to the contract from so my UI can be updated after a user in a separate browser uses the app*.

You can see the latest changes on the latest version of Poster

*FWIW on the long term I see this being only done either by using TheGraph whenever subscriptions are available (if at all) or via more specific heuristics that do not heavily hit the indexers receiving the queries. There are some privacy and centralization issues on relying on these JSON-RPC endpoints that we will need to ignore in the short term.

Topic		Replies	Views
Personal Subgraphs Subgraphs subgraph-devs	2	4349	December 8, 2021
Proposing to make more diverse subgraphs. Need your feedback Questions & Feedback	10	1756	September 20, 2021
A framework to build any data applications entirely on the decentralized Web3 infrastructures Grants & RFPs	19	4591	November 27, 2021

Best practices for web apps interacting with subgraphs

Related topics