A framework to build any data applications entirely on the decentralized Web3 infrastructures

Hi community,

This is Kuan Huang from (www.scout.cool). I am proposing an open source software framework that enables developers to quickly build any data applications entirely on the Web3 infrastructure. Whether someone is building an analytics dashboard for a dApp/protocol or leveraging existing machine learning models to do data science work, this framework should save you lots of time.

Problem:
Every blockchain data analytics company (including ourselves) that we know of is using centralized databases to index blockchain data before serving it to their users. As a result, the majority of data analytics charts or dashboards are living on centralized hosting services and databases. This approach is not only the opposite of where Web3 is heading but also creates a few real issues:

  1. Users are locked into a proprietary web platform due to the centralization of data indexes and hosting.
  2. The lack of transparency in how data is indexed and organized in a centralized database makes it very difficult to debug issues. We personally have run into numerous cases where the on-chain metrics we put together are different from other vendors.
  3. As faster L1 and L2 blockchains are becoming available, this approach is becoming the biggest bottleneck for every data company to scale.

Proposed Solution:
The MVP of the proposed software framework should provide:

  1. A generic module that efficiently pulls data from any subgraphs or other on-chain/off-chain data sources. This module should take care of some common challenges such as pulling from multiple subgraphs in parallel, handling “pagination” in subgraph in the background and etc. (One of the grantees from first wave, Keyko, might be working on some parts of what I am describing)

  2. A data transformation module that does the preparation before the visualizations. There are actually some existing package can be reused such as Pandas, Tensorflow. This also opens a door for any potential machine learning applications that leverage The Graph.

  3. Pre-built widgets (with the options to customize) to render charts and dashboards. Developer should be able to use very few lines of code to render a chart or design a dashboard without touching any front end code.

  4. A simple mechanism to deploy and sharing through decentralized hosting/storage services. The entire community can discover and learn.

  5. An easy to maintain structure since frequent updates of the applications are expected.

Why now:
We are beginning to see what the Web3 infrastructure might look like in the near future:

  1. Decentralized query layers such as The Graph (700+ subgraphs and growing!)
  2. Decentralized storage and hosting services such as IPFS, textile.io, Fleek.co
  3. Decentralized code repository such as Radicle
  4. Decentralized database/caching layer: threadDb, Gun DB

Some inspirations: (Sorry, I am not allowed to post more than two links as a new user in this forum)
Pandas
Observablehq
Streamlit
Vega-lite

3 Likes

Hey this is great! That problem statement definitely resonates.
I would be interested in how you think the different parts of that stack best fit together - for example we are looking at ways of extending subgraph functionality to make it easier to extract data (which might provide some of the functionality you describe in the “generic module”), up to and including “analytics” type functionality (aggregations & transformations). There have also been requests for “custom resolvers”, to give subgraph developers more flexibility in how the underlying data can be queried via graphQL.
However I do also definitely see value in separating concerns.
Might you be free to have a quick call to discuss?
PS vega-lite & streamlit are great!

Hi Adam,

I am based in NYC (est). What’s your email and I can send you a few available slots for a call? Mine is huangkuan at gmail

It’s definitely nice to have the flexibility in the subgraph to extract/aggregate/transform data. But we have also found the benefits of being able to do it on the application level.

For example, you have a subgraph that powers your web application but it is only 70% optimized for an analytics chart that you are building. Writing a few lines of code to transform the data is probably easier than updating the subgraph and redeploy it.

Here is another example. Let’s say I am building this project: https://beta.web3index.org/. Having the ability to aggregate/transform the data on the application level allows me to leverage all the existing public subgraphs without creating my own and syncing blockchain data from the beginning.

Great point on giving applications flexibility, in particular when it comes to leveraging subgraphs created and maintained by other developers.
Looking forward to catching up on this next week.

1 Like