I think that TheGraph is a great way to get access and analyse web3 data for data analysts, data scientists, machine learning developers and many other end (data) consumers.
However, when I started to look around I didn’t find many resources for how to get started as a data scientist working with subgraph data. Even in the wider GraphQL context there is a lack of material for getting started as a Python user. This seems to be because:
a) it’s actually really simple to get going
b) in many cases the same person or team bundle the construction and consumption of data
I’ve put together this really simple guide and template for using subgraph data in Python. It includes constructing a query, sending the query, loading the response to a Numpy array, plotting and loading the data to a Pandas dataframe. From that you can basically do anything in the Python realm with the data.
If there is interest I can continue with this guide and show some more examples. For now the first part is available on GitHub.
Thanks for sharing, @danielzak A nice starting point for sure.
FYI, for specific DeFi use cases, there’s also a python-based library people can use to retrieve info like lending, borrowing and liquidity from multiple protocols (and subgraphs) following the same data model. The tool has been developed by Keko, who has received a grant from The Graph Foundation (wave 1). This facilitates ETL and could be of help for people wanting to explore this sort of use case. (Small note here: in this case, manipulation of data is happening at the client-side (post-indexing), not a subgraph-level)
Thanks for the welcome @Pedro, and good to know about defi-crawler
I’ll be happy to keep you all in the forum updated on what we are doing - it is related to integrating on-chain data and building examples in our platform for decentralized AI (I work at Scaleout).
Yes, we are streamlining the whole process into a developer library with bunches of optimizations. The code is still quite disorganized but we will open source it once they are better organized.
@danielzak Happy to chat offline if you are interested. kuan@scout.cool
Hi @adamfuller . @huangkuan what you are building is really interesting and very complimentary to what we are doing.
I started writing these simple examples as a way for data scientists to get started with machine learning in web3, as Python is the weapon of choice for ~80% of machine learning data scientists out there.
At Scaleout we are building a platform for decentralised AI. It provides an environment for machine learning development and workflows, from building a model, training it, deploying and serving predictions (aka MLOps). It runs in the cloud all the way to bare metal. Also, we lead the development of one of the leading (I’m biased ) frameworks for federated learning that distribute the training of ML models to the data.
So, my interest lies in giving data scientists the best possible tools to build machine learning solutions in web3. I’d like to create a set or library of templates and examples that data scientists can use as starting points. Earlgrey looks great in this context.
Schematic view of a Web3 MLOps process that enables access to on-chain data, private decentralised data and siloed off-chain data: