Final Grant Report and suggestions on integrating subgraph with Sia storage (or IPFS)

I created a subgraph for my app SkyDocs (decentralized Google docs app) to be able to share files with other users. SkyDocs uses Sia Skynet for storage and I have some feedback that would make integrating Sia (or IPFS) with The Graph easier.

I’d love to hear what others think about these suggestions.

This feedback is part of my final grant report available here:
https://www.michielpost.nl/posts/skydocs-and-the-graph-final-grant-report


Subgraph development
The data from the smart contract goes through a mapping file which results in the data that will be indexed. In this mapping, it’s not possible to do a http request. What I wanted was to do a http request to get the referenced file from Sia Skynet and index data from that file.
This would result in less data on the blockchain, so a cheaper solution.
There is an IPFS package for The Graph, to get data from IPFS. A similar Sia package would be helpful.

GAS fees
A big limitation is that there has to be an on-chain action for The Graph to index. But on-chain transactions cost GAS and with high gas fees, there is no incentive to use this smart contract. Why would you pay gas to share a document?

As mentioned before, The Graph can already index data from IPFS. For example when an IPFS hash is posted to a smart contract, this hash will be available in your mapping. But this needs an on-chain transaction.

It would be really cool if The Graph could index immutable data on IPFS or Sia without an on-chain transaction. This would make The Graph useful for a lot more web3 scenario’s.

This is not an easy problem the solve. Even though data on IPFS or Sia is immutable. It makes the process non-deterministic. The network may fail, or IPFS data is not available anymore.

A possible implementation from a developer perspective could be something like this:
The Graph could create an API endpoint that accepts IPFS or Sia hash/file locations.
It then fetches the data and processes the data like it was a smart contract event. Maybe store this data on a private (sub)blockchain, so it’s similar to how The Graph indexes other blockchains. This can eliminate the need for on-chain transactions with gas fees.

Example:
Send file hash to The Graph API
POST: https://thegraph.network/api/add
Data: sia://hash
Load data from sia://hash
Data is stored on a special The Graph blockchain, fees can be prepaid in GRT?
Mapping file gets the data on the blockchain as input and allows you to create subgraphs

More TheGraph metadata in Git
I like to store as much data as possible in source control / git. It would be nice if the description/logo and also example queries could be added to subgraph.yaml or another file that gets deployed to The Graph

2 Likes

Hey @michiel :wave:

Happy to have you building on The Graph! :slight_smile:

This decentralized collaborative workspace is a nice use case indeed which pushes the network’s current capabilities. There are a few limitations at the moment, as you mentioned, which should be lifted with better support for storage networks. However, you should keep in mind that The Graph is not intended to be a storage solution nor any type of blockchain itself. It’s impossible to index data from networks like IPFS as there is no chain head to keep up with (and, thus, index sourced data). It is not just not quite ready yet for this type of use case where you need to be aware of new files that have been uploaded to any decentralized storage network, whose resulting hash is not anchored on any typical chain.

About the proposal itself, wondering how you would handle parsed data from those hashes and store it in a known schema as you can’t know the contents’ structure beforehand. Also, are you suggesting that sourced data from ipfs/sia is parsed and stored afterward on a blockchain so that it can then be indexed? How is that any better, if you’re inefficiently storing that data on a blockchain? Why use storage networks in the first place then?

With that said, stay on the lookout! There might be upcoming features facilitating or enabling these types of use cases in a gasless manner.

Thanks for sharing this and the comprehensive blog post. :+1:

1 Like

I’m not suggesting to store the content of IPFS files on a blockchain. But to keep close to the current way of working, we could setup an “ipfs-thegrpah-blockchain” where you can submit hashes of ipfs files to. This solves the discoverability part, so we know what to index on ipfs.

That way, it also works just like an Ethereum smart contract. But instead of sending an IPFS hash to a smart contract, we send it to the special ipfs-thegraph-blockchain.
Once the hash is on the ipfs-thegraph-blockchain, The Graph indexers can parse the hash (same as from the Ethereum blockchain) and then load the ipfs file content and apply mapping, indexing etc.

And you’re right that we don’t know the contents’ structure beforehand when we start mapping. But there are ways to fix that, like validate the content against a json-schema and a file size limit would probably also be needed.
Currently subgraph mappings can also counter unexpected data and throw an exception right?

2 Likes

This is exactly the thing I was looking for.

If I understand what you want to do (with how I want to use it)

  1. On the dapp add a product
  2. Add a keyword file/list to IPFS
  3. Emit event with ipfs hash + basic product info
  4. Thegraph gets the event, indexes the product
  5. Retrieves the IPFS keyword file
  6. Associates/indexes the data with the product
  7. Can query products by keyword
1 Like

Yes, could work that way. Great to see that I’m not the only one needing this kind of functionality.

But it doesn’t look like The Graph will support this kind of scenario any time soon.