A framework to build any data applications entirely on the decentralized Web3 infrastructures

Hi community,

This is Kuan Huang from (www.scout.cool). I am proposing an open source software framework that enables developers to quickly build any data applications entirely on the Web3 infrastructure. Whether someone is building an analytics dashboard for a dApp/protocol or leveraging existing machine learning models to do data science work, this framework should save you lots of time.

Problem:
Every blockchain data analytics company (including ourselves) that we know of is using centralized databases to index blockchain data before serving it to their users. As a result, the majority of data analytics charts or dashboards are living on centralized hosting services and databases. This approach is not only the opposite of where Web3 is heading but also creates a few real issues:

  1. Users are locked into a proprietary web platform due to the centralization of data indexes and hosting.
  2. The lack of transparency in how data is indexed and organized in a centralized database makes it very difficult to debug issues. We personally have run into numerous cases where the on-chain metrics we put together are different from other vendors.
  3. As faster L1 and L2 blockchains are becoming available, this approach is becoming the biggest bottleneck for every data company to scale.

Proposed Solution:
The MVP of the proposed software framework should provide:

  1. A generic module that efficiently pulls data from any subgraphs or other on-chain/off-chain data sources. This module should take care of some common challenges such as pulling from multiple subgraphs in parallel, handling “pagination” in subgraph in the background and etc. (One of the grantees from first wave, Keyko, might be working on some parts of what I am describing)

  2. A data transformation module that does the preparation before the visualizations. There are actually some existing package can be reused such as Pandas, Tensorflow. This also opens a door for any potential machine learning applications that leverage The Graph.

  3. Pre-built widgets (with the options to customize) to render charts and dashboards. Developer should be able to use very few lines of code to render a chart or design a dashboard without touching any front end code.

  4. A simple mechanism to deploy and sharing through decentralized hosting/storage services. The entire community can discover and learn.

  5. An easy to maintain structure since frequent updates of the applications are expected.

Why now:
We are beginning to see what the Web3 infrastructure might look like in the near future:

  1. Decentralized query layers such as The Graph (700+ subgraphs and growing!)
  2. Decentralized storage and hosting services such as IPFS, textile.io, Fleek.co
  3. Decentralized code repository such as Radicle
  4. Decentralized database/caching layer: threadDb, Gun DB

Some inspirations: (Sorry, I am not allowed to post more than two links as a new user in this forum)
Pandas
Observablehq
Streamlit
Vega-lite

7 Likes

Hey this is great! That problem statement definitely resonates.
I would be interested in how you think the different parts of that stack best fit together - for example we are looking at ways of extending subgraph functionality to make it easier to extract data (which might provide some of the functionality you describe in the “generic module”), up to and including “analytics” type functionality (aggregations & transformations). There have also been requests for “custom resolvers”, to give subgraph developers more flexibility in how the underlying data can be queried via graphQL.
However I do also definitely see value in separating concerns.
Might you be free to have a quick call to discuss?
PS vega-lite & streamlit are great!

Hi Adam,

I am based in NYC (est). What’s your email and I can send you a few available slots for a call? Mine is huangkuan at gmail

It’s definitely nice to have the flexibility in the subgraph to extract/aggregate/transform data. But we have also found the benefits of being able to do it on the application level.

For example, you have a subgraph that powers your web application but it is only 70% optimized for an analytics chart that you are building. Writing a few lines of code to transform the data is probably easier than updating the subgraph and redeploy it.

Here is another example. Let’s say I am building this project: https://beta.web3index.org/. Having the ability to aggregate/transform the data on the application level allows me to leverage all the existing public subgraphs without creating my own and syncing blockchain data from the beginning.

Great point on giving applications flexibility, in particular when it comes to leveraging subgraphs created and maintained by other developers.
Looking forward to catching up on this next week.

1 Like

Hey @huangkuan and @adamfuller - I’ve been building something very much like you describe for the past 10 years or so. Based on linked data / semantic web.

It basically maps any data (for example from GraphQL) to typed linked data … puts it in the global linked data graph … and then offers interface components based on the chosen universal types / schema (like Person, Token, Project, etc). You can build & click together itnerfaces and visualisations of the the data and finally publish it as a fully functioning web3 app.

After years of developing the underlaying tech I’m now launching it as a project / product and am getting more people involved. Just launched a very early rough website www.semantu.com

More demo’s - including visualization of data from The Graph - coming soon!
Get in touch with me for a 1-1 if this interests anyone, would love to get in touch with some people from The Graph to further explore this.

Hi everyone, let me share with you some great progress we have made on the library:

Demo: AAVE vault deposit amount from Jan 1st to Jan 10th

This demo shows how little code is required to load data from a subgraph and display it in a line chart. Earlgrey is the name of the library.

Highlights:
load_subgraph: This function does all the magic of loading data from a subgraph efficiently

  • It bypasses the page limit on the The Graph side. You can simply pass the start time and end time in the query and it takes care of the rest.
  • It automatically converts string type data (which is how the graph returns data) to its proper type based on the schemas of the subgraph
  • If your graph ql contains multiple entities, this function loads them concurrently to save time.

plot:
Plot the data on a line chart.

cc @adamfuller

Here is another demo that shows a slightly more complicated use case:

Go to the same link https://earlgrey-demo.herokuapp.com/ and click the arrow on the top left corner to see the second demo. Instructions below:
Screen Shot 2021-07-12 at 3.26.04 PM

aggregate_timeseries aggregates data by time intervals (hourly, daily, monthly)

Hi @huangkuan this does indeed look like great progress! Is the code open-source, would be interested to take a look if so? (the “How to install link” seemed to be broken)
How are you determining time for blocks, are you using the ethereum-blocks subgraph?
What do you mean by “If your graph ql contains multiple entities, this function loads them concurrently to save time.”?

Hi @adamfuller Our goal is to find an efficient solution to work with large data sets. We are experimenting with a few approaches:

  1. In this demo, we use one queue for each entity.
  2. We are also trying to get the data without breaking it down by entity.

In a scenario that to query X Y Z entities, each has x, y, z (thousand) records.
With approach #1, the data is fetched in x + y + z requests in 3 concurrent queues.
With approach #2, the data is fetched in max(x,y,z) requests in 1 queue.
Obviously, #2 would make fewer requests, still testing timing cost. But in case of failure, #2 would have a smaller number of data to re-fetch.

Hi everyone,

We have made our first step! I want to share with you an initial version of a library that we have been working on in the past 6 weeks. The library is called Bubbletea

The idea here is to enable developers and data scientists to quickly build data applications directly on top of The Graph network. We would love to have interested people to give it a spin and share with us features you would like to see or issues you have discovered along the way.

2 Likes