Personal Subgraphs

Personal Subgraphs

I have been thinking about a concept I call “Personal Subgraphs” for a few weeks and would like to propose the idea to the community. I hope this creates discussion and ideas around Subgraphs built for a User instead of a Protocol, as well as the privacy of Subgraphs, and allowing users to own their data in the web3 world.

What are Personal Subgraphs?

The simplest way to describe a Personal Subgraphs is that it is a View on an Account. Normally, Subgraphs have been built to give a View to a Protocol. As you can see in The Graph Explorer, almost every Subgraph is tied to a Protocol.

I believe there are many problems Personal Subgraphs can solve, but I would like to focus on one specifically - web3 wallets.

A Subgraph as a Personal Web3 Wallet

It is tough for a wallet application, such as Rainbow, to use Subgraphs to display a user’s holdings. The main reasons are as follows:

  • Indexing an ERC-20 token can be extremely time-consuming - The Graph currently will index a contract address from a startBlock - and will query the Ethereum RPC for every event on that contract address. This makes indexing a coin like USDT or USDC incredibly slow.
  • Wallets must display many different ERC-20s a user holds.
  • Wallets must show balances for multiple tokens the user holds. The only way to know this is to index everything (i.e. Etherscan) - or deliberately choose what to index.
    • A great attempt was made by Open Zeppelin to deliberately choose the top 20 ERC-20 tokens for a subgraph - by taking a combination of token lists and filtering for the most common. But even this is too slow, and it also does not solve the problem of the wallet - since some users may have tokens outside of the Top 20.
  • Wallets must display many different NFTS a user holds - NFT’s have become super popular, and users now want to see both ERC-20s and NFTs.

These reasons prevent web3 wallets from using Subgraphs. However, I believe there is a way to pull it off right now, without requiring the Graph Node software to index every token that has ever existed.

The Solution

Personal Subgraphs provide a solution to the wallet problem. I will break down the solution into three parts - what needs to be built at the Indexing Layer, what needs to be built at the Smart Contract Layer, and what needs to be built at the Product Layer.

Indexing Layer

Let’s start with Indexing first. This would involve a change to The Graph Node. The changes needed are as follows:

  • Ethereum events can have up to 4 Topics, which are explained nicely here by the Ethers Documentation.
  • Graph Node would have to add in the ability to filter on topics other than topic0.

  • This should speed up Subgraph syncing immensely for Subgraphs that want to filter on multiple topics.
  • It becomes possible to filter for Transfer events that only contain our user’s address in the to or from the fields for ERC-20s. The same for ERC-721s.
  • I believe it is also possible to filter for Transfer events with a user address for all smart contracts on the network. This might still be slow, it would have to be tested. If it is too slow, we can filter on specific ERC-20s and NFTs chosen by the user. Example:
    • Bob owns 10 ERC-20’s and 5 different ERC-721 tokens.
    • These 15 contracts are all in the Subgraph Manifest. Transfer events are indexed for each contract, and they filter on the user’s address for both the to and from field.

Smart Contract Layer

Let’s now discuss how to implement it in the Smart Contracts. There are two ways to think about it.

Deploy Personal Subgraph to Index All Transfer Events on All Contracts

  • We could just deploy these Personal Subgraphs to the current network, and it should work.
  • If we filter on topics 1, 2, and 3 for the Transfer event, and Graph Node can keep up, then we should have ERC-20 balances for all Users obtainable.
    • Note - this assumes that we can query the entire blockchain with all 4 topics as filters - i.e. you check every smart contract, but filter extremely tight on the 4 topics. I am fairly certain this is possible but I have not confirmed.
  • Same for ERC-721s.
  • This would require no change to the Smart Contract Layer.

Start thinking about Subgraphs as NFTs and Personal Data Ownership

  • Let’s consider that the user wants a specific View into their account. A View on 10 ERC-20s and 5 NFTs that they currently own.
  • We can anchor this information on-chain, to give triggers to the Subgraph to know what to index. A simple example follows:

  • This shows how a user can indicate the specific data they want to be indexed. As you can see the View is derived from both the Sender and the User accounts. This means that anyone can build a View into any account.
  • A key thing I want to point out is that with this setup, a personal wallet subgraph can become auto-deployable code. The only parameters that change would be the Sender and User. This would create a unique and deterministic subgraph ID for anyone’s view into an Ethereum account.
    • I believe it could be some sort of a web3 personal data primitive. It needs more thought and discussion though.
  • The Subgraph also does not need to Index from the startBlock of the ERC-20 contract. At the instantiation of an ERC-20 being anchored, the Subgraph can do an eth_call to get the balance of the user for that token, and then index events forward from there.

Transaction Cost

  • Of course, this would be expensive. I tested it and it cost $20 to add 4 tokens through events.
    • It could also just be anchored on IPFS as a JSON file that the Subgraph parses. Thus, instead of emitting 20 events for an ERC-20 that is added, you just emit 1 IPFS hash to query, which contains that list of the 20 tokens. This should result in about $5 to index as many tokens as you’d like.
  • However, a better solution would be to build it into the core Graph Node software, rather than anchoring it to an expensive L1, or semi-expensive L2. This would make it free. I currently do not have a suggest on how to fit this into the manifest, help here would be great.
  • The cost of deploying Subgraphs to the decentralized network right now is prohibitively expensive for an individual to pay for a Personal Subgraph. Each new Subgraph created costs about 0.1 ETH or around $450 dollars in today’s prices. For now, we can test with the Hosted Service, but long term it would have to become cheaper.

Product Layer

Let’s now discuss how we could create a product that would allow anyone to create their own personal subgraph. I can think of two ways this would happen:

  • A user coming to a website to do it themselves, to monitor their own balances in a wallet
  • A Web3 Wallet that ties into this system and does it behind the scenes for its users.

In either case, this is what it could look like:

  • Have a list of NFTs and ERC-20s from a dropdown. This could be a curated list that exists such as Token Lists. User picks the tokens they own so they can be added to the wallet.
  • For rare and exotic tokens not on a Token List, allow a user to directly paste in a contract address of any token.
  • With this information, the Subgraph can now be deployed. The Subgraph would be a template, such as the Open Zeppelin Subgraphs. It would require the following inputs:
    • The address of the user.
    • The address of the sender/creator of the Subgraph, to create the unique View. And as explained before, it would be unique and deterministic.
    • All the tokens the user chose from the dropdown, to filter on the specific tokens (if needed).

Proof of Concept

I have implemented a proof of concept here. I hacked it together, since there is no ability to filter on topics 1, 2, and 3. It contains the following:

  • A simple PersonalSubgraphAnchor which anchors emitted ERC-20s to track for an address, as a trigger for the Subgraph.
  • A Subgraph that tracks the Anchor contract, and then uses Data Source Templates to track all ERC-20 tokens that are emitted for that View.
  • The Ethereum Account we are Indexing is Binance 14 (as labeled by Etherscan). This was chosen as it would be a highly active account on Ethereum with many transactions.
  • The Subgraph does not index any of the tokens from the start. It does a contract call to get the balances of Binance 14 and then starts to index every block for those tokens.
  • The personal subgraph is deployed here. It can be seen that it is successfully syncing this account for 4 tokens, AAVE, DYDX, USDT, and USDC. It has been syncing smoothly for over two weeks.
  • It does not include the ability to RemoveToken for watching, but ideally a real implementation would.

Limitations

  • It only works well with established standards. When talking about ERC-20s or ERC-721s, any contracts that don’t follow the standard (i.e. Crypto Punks) might not work.
  • Only improves Subgraphs that need filtering on all event topics. It works very well for ERC-20s and ERC-721s. It may have other good uses as well. But many protocol Subgraphs will gain no benefit from this upgrade.

Future Ideas for Personal Subgraphs That Need More Discussion

As I noted above that this has the potential for a web3 data primitive, but it needs more thought. I will list some ideas I have on what could be built.

Subgraphs as NFTs and Personal Data Ownership

  • There is an open PR to the contracts where you can see an implementation of Subgraphs as NFTs. The concept of Subgraph as NFTs is very interesting, especially when you start to think about data ownership.
  • I believe every data protocol focused on “owning your personal data” could just build Subgraph NFTs, and build their protocol on top of The Graph.
  • The cost is a bit insane to launch an ERC-20 for a personal subgraph. We need to solve this of course. But it is a problem everyone faces.

Privacy

  • In any discussion of personal data, privacy is needed. We will need to think about how to protect users’ data if people are building personal subgraphs.
  • The API access to query the subgraph could be limited to the NFT holder. As the web3 ownership economy matures, we likely will see the disappearance of API Keys, replaced by NFTs as access keys.

Creation of a Decentralized Competitor to Blockfolio

  • For the Personal Wallet Subgraph, we could build a module that allows for the charting of the user’s financial assets. This would allow for a decentralized competitor to Blockfolio (FTX).

Creating Views that combine Multiple Ethereum Accounts. Or even Accounts Cross-Chain

  • Most users own multiple accounts across multiple chains. We can aggregate these into a single Subgraph.
  • The trick here is privacy. If you combine your work wallet and your personal savings wallet, you’ve just doxxed your savings to your employer.

Perpetually Free Personal Subgraph with a Threshold of Delegated GRT

  • In an idealistic world, every person should be able to mint a basic Subgraph NFT for their own data for very cheap, or free.
  • They should be able to stake some small amount of GRT in perpetuity. The rewards from this GRT could get routed directly to an Indexer, to pay for ongoing Indexing and query fees of their Personal Subgraph.

Conclusion

I will list the important parts to pull from this long post:

  • Web3 wallets can integrate with The Graph with an upgrade to Graph Node to filter on all topics.
  • I believe if we could get this implemented, we could reach +10,000 Personal Subgraphs deployed, if a web3 wallet decided to integrate. (Gas issues still exist, hosted service could be used in short tern).
  • There is a rough POC showing how it would work (without the filtering).
  • I believe Subgraphs as NFTs could be a web3 data primitive.
    • Personalized Subgraphs could be a huge part of this, although I believe much effort will be needed on the privacy side.

Long term I hope this builds more discussion and ideas around centering Subgraphs around Users instead of Protocols, as well as the privacy around these Subgraphs.

Open Questions

  • Are there other use cases for filter-by-topic syncing? (The open feature request makes me believe there are).
  • What would the upgrade for Graph Node look like and how hard would it be? What would the manifest change look like to filter only on an event topic?
  • How could we get web3 wallets such as Rainbow to use this?
  • How can we make it cheap enough so anyone can deploy a Personal Subgraph to the Decentralized Network? Could we partition all of them to an L2?
  • How do we deal with privacy long term, when users want to combine multiple of their accounts in a single View, but not doxx themselves and their connecting accounts?
  • Is there a better, more general name than Personal Subgraphs?
13 Likes

Hey @davekaj thanks for sharing!

However, I believe there is a way to pull it off right now, without requiring the Graph Node software to index every token that has ever existed.

As it happens, we are working on how we can make this feasible in Graph Node.

Graph Node would have to add in the ability to filter on topics other than topic0.

This is definitely interesting functionality - as well as the user-centric use-case you highlight, this has also been requested to filter ERC20 transactions which involve only a specific contract or set of contracts (e.g. I want to see USDC transfers touching my protocol contracts)

On per-user subgraphs - this goes in the direction of having thousands → millions of individual subgraphs. I think there is an interesting discussion here, as that will significantly fragment signal & allocation on the network. I am not sure if @Brandon has ideas here on the number of subgraphs the network can support (by order of magnitude)

The Subgraph also does not need to Index from the startBlock of the ERC-20 contract. At the instantiation of an ERC-20 being anchored, the Subgraph can do an eth_call to get the balance of the user for that token, and then index events forward from there.

I like that pattern to get around the historic indexing problem.

Nice!

This is interesting. All of this data is available in block explorers, so I don’t think personal subgraphs reduce privacy? I guess there is potentially easier connection of different accounts.

This is a cool line of thinking.

I think there is also the general question as to whether a single “public-good” type subgraph indexing all token transfers for all addresses would be a preferable state, as long as that was performant (indexing & querying)

Thanks for the write-up!

2 Likes

Thanks for the response! Sending a few responses back myself:

Yes it would fragment it. Which may or may not be bad. I think this depends on architecture, implementation, and user experience. My guesses right now:

  • Let’s say you wanted 1 billion people to be on web3 in 5 years. Can we make 1 billion different subgraphs? My best guess is this is not the greatest way to do it.
  • But I do believe we need some way to show a user centric view. Subgraph composition can probably help with this. What I am thinking is that the user pin-points queries they want to make for specific data that pertains to themselves, through 1 or a few Composed Subgraphs. Now the view is user-centric, but because of the queries, not the indexing.
  • This might not be a perfect separation point - but it seems like 1 billion subgraphs would be 1 billion different databases, with few rows and columns. Versus maybe, 10,000-50,000 bigger subgraphs, with many rows and columns.

Thinking out loud, the second option seems more likely to work from an architectural perspective.

Right now I can safe with 100% certainty the protocol on Ethereum L1 could not handle that many subgraphs. An important question, probably worth its own thread is - “How many subgraphs do we expect to see in 5 years (or 10 years)?” Because this should influence what kind of scaling solution the graph needs to choose. If it is millions, we pretty much need to be our own rollup.

My gut feeling is this subgraph may elude us forever. A few reasons:

  • I expect tokens deployed to increase exponentially for 5-10 years. With this, token transfer will increase even faster.
  • There are truly some shit-tokens out there that do not deserve to be included. The subgraph itself can’t decide on this. It would have to adopt some on chain registry that decides an o.k. list. Maybe through a token-curated-registry, or something.
  • Tokens will be deployed across multiple blockchains, and users will want their cross chain balances displayed to them very soon (1-2 years?)
  • These three points make me feel we will never get to this public good subgraph, without making clear-cut decisions to eliminate some tokens, and choose chains to include. It might be a public good for some but not for all. (i.e. it might have a better UX for Ethereum tokens, compares to Solana, or some other blockchain).
  • Also, to build this subgraph, to me, seems like indexing 95-99% of the financial transactions in the world (10 years from now). Seems to me this would be a behemoth task, maybe not suited for a decentralized network.

I would sum up my second point, as a theoretical question - “What size subgraph is too big to be indexed by the decentralized network?” Which I think is a great question - as you obviously can’t index everything in the world. There is some reasonable limit.

Sorry! I’ve left the conversation at a point with two more theoretical questions… which may not do us much good right now. They feel like future-looking questions.

1 Like