We are building a data analytics library Bubbletea to encourage more people to visualize data on the Graph network. As we are thinking through different use cases, we notice there isn’t much diversity in the subgraphs. Most of the subgraphs are very protocol specific. We would like to make some new subgraphs that can be more wildly used among the community.
A subgraph that contains a mapping of Ethereum address and names. The name could be an ENS name, a smart contract name and even later on some more customized tags such as “Coinbase wallet address”. We have seen data services such as nansen, chainanalysis, dune and etc are building this data set on a centralized database. It would be very very nice to have it on the graph network.
A subgraph that contains more off-chain market data. There are tons of market data that is not offered on chainlink but by some centralized service providers. For example, I want to use Call vs Put volume of a specific token over time with some other token on-chain metrics in one dashboard.
Correct me if I am wrong, it seems like building above subgraphs is technically doable. But the source of data isn’t directly from Ethereum mainnet. And for the first subgraph, it will also require some community driven contribution and data curation. I wonder how the indexers, delegators and curators will feel about.
Hi @huangkuan this is really interesting. ENS names could obviously be found on-chain, but the other meta-data you describe (custom tags, smart contract names) are off-chain information. I agree that there is real value in this being available as a public good - rather than a subgraph, it would seem to be more like a data-registry? In that case the challenge becomes curation / moderation, as you say.
On the second idea (off-chain market data) - are you referring to information from centralised exchanges?
Thanks for sharing!
I assume when you say a “data registry”, you meant a file or a traditional database? Under that assumption, I think a subgraph might make more sense:
As a developer and potential user, I’d want to be able to send an address or a batch of addresses to an api and expect it to return me a list of the address mapping. A subgraph is this api where I can “query”. Here is an example: I want to build a table that tracks the contracts which receive most deposits in the past 24 hours. Once I figured out all the contracts addresses, I need a way to turn them into names.
A data set like that requires ongoing maintenance. There needs to be incentives which courage people to contribute and curate high quality data. The better the data set gets, the more queries will be sent to that subgraph. And more fees indexers and curators can earn. I feel part of this positive feedback loop is already baked into the design subgraph network.
As a developer, if i am already building an application that interact with subgraphs, I might as well getting more from subgraphs. Shameless self promotion: Our library Bubbletea supports loading multiple subgraphs in paralelle.
For the second idea, yes, I was referring to the information from centralized exchanges. But it could also be applied to any off-chain data. For the sake of this conversation, let us focus on the APIs of centralized exchanges. There are tons of APIs which are not covered by chainlink. To name a few, deribit, ftx, dydx. Since not all of their APIs store historical data, I would typically use a centralized database as an intermediate layer to store real time API data over time. For example, I monitor both on-chain stable coin activities and off-chain derivative funding rate and open interest data to track the sentiments of the market. But the historical data funding rate and open interest data are not accessible via the APIs. To build a decentralized app, I would much rather getting all my data from subgraphs instead of from different sources.
Let me provide some background for the above ideas. We want to build a new type of data analytics product on top of this emerging decentralized query layer. Traditionally, data has been quite a “centralized asset” such as centralized data warehouses, workflows, software layers and likely the entire stack.
We see The Graph as a way to fundamentally unbundle this centralized asset. When you can empower anyone to index any data without worrying about scalability (yay indexers!) and have a way to monetize it, something refreshing can happen from that. It is true that The Graph is originally designed to index on-chain data. But the fact that we are trying to “hack” it to meet our needs shows you there is a much much larger potential.
When I said “data registry”, I was thinking about the means of keeping it up to date (i.e. based on user contributions, based on off-chain data) - I guess that registry could be anchored on-chain which would then be quite trivial to put in a subgraph. On the second point, I guess again that information could be anchored on-chain (likely on a cheaper chain), which would then readily accessible in a subgraph.
Wondering if there were other approaches to sourcing triggers for these subgraphs that you had in mind? i.e. would it be accessible in a centralised database / APIs / IPFS / other?
Definitely agreed on unbundling centralised data - I think the challenge is just where that data needs to be in order to be indexed in a trustless way (currently we can only really get that certainty from blockchains).
@adamfuller Actually the Graph node doesn’t support “async function call”. That eliminated the possibility of retrieving data from external APIs and write them into a subgraph. Bummer.
cc @Slimchance
Can you elaborate on that? what do you mean by “sourcing triggers”?
Subgraphs are event-based - i.e. they (currently) rely on on-chain events to trigger mapping execution. I was wondering what events you had in mind to trigger processing of the data you are interested in.
That eliminated the possibility of retrieving data from external APIs and write them into a subgraph
Graph Node does not currently support calls to arbitrary APIs outside IPFS, and we are looking to improve the robustness of that IPFS functionality soon. The difficulty with IPFS, and even more so with arbitrary external APIs, is that it is not possible to index deterministically, which is a network requirement. What mechanism would you have in mind to keep the subgraph up to date with the latest data?
Subgraphs are event-based - i.e. they (currently) rely on on-chain events to trigger mapping execution. I was wondering what events you had in mind to trigger processing of the data you are interested in.
The most appropriate trigger would be a block trigger but it’s not recommended by the team currently. We can also make it work with an event or call trigger which is probably not efficient but can get the job done.
The difficulty with IPFS, and even more so with arbitrary external APIs, is that it is not possible to index deterministically, which is a network requirement.
For our usecase, we’d like to integrate off chain data into the graph entities, which is described in the graph shemas. So we actually don’t need the graph to index the whole IPFS or any arbitrary off chain data.
What mechanism would you have in mind to keep the subgraph up to date with the latest data?
For now, we play to add files to IPFS periodically. So block trigger would work to get the latest data, once IPFS access can be established.