Optimizing Subgraph Performance for Large Datasets

romie090 · December 24, 2024, 12:46pm

Hi everyone,

I’m working on a subgraph that processes a fairly large dataset, and I’ve been running into performance issues that I’m hoping to get some guidance on. The dataset contains millions of entities, and while the subgraph syncs without errors, the speed and efficiency leave much to be desired.

Here are the details of my current setup:

Mapping Functions: I use mapping functions to index new entities and update existing ones. However, I’ve noticed that the processing slows down significantly as the dataset grows. Are there best practices for structuring mapping functions for large datasets?
Entity Relationships: My schema has several one-to-many and many-to-many relationships. I suspect that the way I’ve designed these relationships might be contributing to the slowdown. Would denormalizing the schema help in this case?
Filtering and Pagination: When querying data through the GraphQL API, I’m using filters and pagination, but the response times are still longer than I’d like. Are there any optimizations I can apply on the query side to speed up API responses?
Indexing Performance: Are there specific methods or configurations to improve the indexing performance during the sync process, especially for event-heavy contracts?
Node Infrastructure: Lastly, I’m running my own Graph Node, and I wonder if hardware specs or network configurations could be a bottleneck. Is there a recommended setup for handling high-throughput subgraphs?

I have also checked this: https://forum.thegraph.com/t/a-process-for-specifying-the-subgraph-api-version-and-feature-support-matrix/python

I’d greatly appreciate any insights, resources, or examples of similar setups that have worked well for you. Also, if there are common pitfalls to avoid in large-scale subgraph development, please share your experiences.

Thanks in advance for your help!

pdiomede · December 27, 2024, 11:14am

Hi there!
Your subgraph setup sounds impressive, and tackling performance issues with large datasets can definitely be a challenge.

We highly recommend checking out Graph Builders DAO (website here)—a hub where top experts in subgraph development collaborate to create scalable and efficient solutions. You can also join our Discord community to connect with like-minded builders and get specific guidance.

Please note that there may be some delays in getting detailed responses from the community or team due to the festive season

In the meantime, tagging here our Graph Builders DAO experts @DataNexus and @Mack that can provide expert guidance tailored to your specific challenges.

Topic		Replies	Views
Advice on Optimizing Subgraph Performance Questions subgraph-devs	5	685	May 20, 2024
Developer Highlights #3 - “Building ambitious subgraphs” part 2 Resources subgraph-devs , developer-highlights	0	2493	June 7, 2021
Subgraph sync to slow Research & Development	3	919	March 22, 2024

Optimizing Subgraph Performance for Large Datasets

Related topics