Getting started with subgraph data in Python

Hi fellow data lovers :wave:

I think that TheGraph is a great way to get access and analyse web3 data for data analysts, data scientists, machine learning developers and many other end (data) consumers.

However, when I started to look around I didn’t find many resources for how to get started as a data scientist working with subgraph data. Even in the wider GraphQL context there is a lack of material for getting started as a Python user. This seems to be because:
a) it’s actually really simple to get going
b) in many cases the same person or team bundle the construction and consumption of data

I’ve put together this really simple guide and template for using subgraph data in Python. It includes constructing a query, sending the query, loading the response to a Numpy array, plotting and loading the data to a Pandas dataframe. From that you can basically do anything in the Python realm with the data.

If there is interest I can continue with this guide and show some more examples. For now the first part is available on GitHub.

Resources:
Link: thegraph-intro/0.1-Intro.ipynb at main · danielzak/thegraph-intro · GitHub
My Twitter for updates: https://twitter.com/danielzak

6 Likes

That’s really great Daniel! We need a lot more documentation so everyone taking matters into their own hands is contributing greatly to the ecosystem.

1 Like

Thanks for sharing, @danielzak A nice starting point for sure.

FYI, for specific DeFi use cases, there’s also a python-based library people can use to retrieve info like lending, borrowing and liquidity from multiple protocols (and subgraphs) following the same data model. The tool has been developed by Keko, who has received a grant from The Graph Foundation (wave 1). This facilitates ETL and could be of help for people wanting to explore this sort of use case. (Small note here: in this case, manipulation of data is happening at the client-side (post-indexing), not a subgraph-level)

Curious to know how you see this progressing :+1:

1 Like

Welcome to the Forum, btw! :smiley:

1 Like

Thanks for the welcome @Pedro, and good to know about defi-crawler :slight_smile:

I’ll be happy to keep you all in the forum updated on what we are doing - it is related to integrating on-chain data and building examples in our platform for decentralized AI (I work at Scaleout).

Just added another example - monitoring a smart contract for some specific events using polling.

Monitoring a smart contract with TheGraph and Python

The repo can be found at GitHub - danielzak/thegraph-intro: Getting started with TheGraph in Python

Hi @danielzak thanks so much for sharing. You might also be interested in this thread: A framework to build any data applications entirely on the decentralized Web3 infrastructures - #6 by huangkuan
where @huangkuan is working on a python library that facilitates extraction of data from subgraphs for analysis and visualisation.
It would be great to hear more about your use cases - are you focused on exploratory / adhoc analyses and visualisations, or are you also looking at building ongoing / larger models?

Yes, we are streamlining the whole process into a developer library with bunches of optimizations. The code is still quite disorganized but we will open source it once they are better organized.

@danielzak Happy to chat offline if you are interested. kuan@scout.cool

Hi @adamfuller :wave: . @huangkuan what you are building is really interesting and very complimentary to what we are doing.

I started writing these simple examples as a way for data scientists to get started with machine learning in web3, as Python is the weapon of choice for ~80% of machine learning data scientists out there.

At Scaleout we are building a platform for decentralised AI. It provides an environment for machine learning development and workflows, from building a model, training it, deploying and serving predictions (aka MLOps). It runs in the cloud all the way to bare metal. Also, we lead the development of one of the leading (I’m biased :slight_smile: ) frameworks for federated learning that distribute the training of ML models to the data.

So, my interest lies in giving data scientists the best possible tools to build machine learning solutions in web3. I’d like to create a set or library of templates and examples that data scientists can use as starting points. Earlgrey looks great in this context.

Schematic view of a Web3 MLOps process that enables access to on-chain data, private decentralised data and siloed off-chain data:

p.s. just added a pagination example in the repo above if anyone is wants to see how to build that from scratch