Why do we need curation?

I’ll start by saying that my team and I are fully onboard with @yaniv 's vision of The Graph that empowers communities to organize and access all the world’s public knowledge and information in a decentralized way. I personally have shared this vision since my participation in the Curator program and the few grant applications to conduct research in that direction – and I’m confident @Pedro could attest to this.

Originally, I had envisioned myself as an active participant in the curation process. However, as time went on, I found my role evolving into that of a delegator and an enthusiastic user of the data that The Graph provides.

While I recognize the inherent value of curation, I’m beginning to question whether a formal role with specific incentives is truly necessary to fulfill The Graph’s vision. To make my case clear, I’ll be using the provided guiding questions to structure my thoughts.

What problems should it solve

From what I have seen, there are a few suggested outcomes that the role of curator would be driving. Roughly organized into three categories.

Service Catalog

This corresponds to the key problem that you (@Rem ) have highlighted in your response and roughly the Graph’s 2nd layer as proposed by Yaniv. In that sense, we definitely need a way to find, understand and trust the services available on the network.

This would also be the function that is closest to what curation as it exists today seeks fulfill by providing a metric (signal) to gauge the relative quality of subgraphs as a proxy for trust.

However, both the search (find) and presentation (understand) functions are both implemented in peripheral tooling through the explorer.

Graph of Data

This would be another aspect of Yaniv’s vision given the new roadmap, the original goal of curation and what has been started with Geo. This was the function that got me excited about the Graph in the first place (and still does).

The goal here would be to leverage the wisdom of crowds in order to build out and grow a unified digital representation of knowledge inline with Tim berners lee’s original proposal for the semantic web Semantic Web - Wikipedia (or Web 3.0 as opposed to web3) .

This also touches upon the “understand” part of the Service Catalog section. There would definitely be value in building out a knowledge graph that could allow users to quickly see what kind of data the services The Graph can serve, how the data components relate to each other both within and between services to each other, and how they map to onchain entities.

Community Building

The last (but not least) category involves curation acting as a powerful catalyst for community engagement. By equipping community members with the appropriate tools and incentives, we can encourage a collective effort to gather information. This approach invites both developers and non-developers to actively participate and spark meaningful conversations about the best ways to organize and assign value to information.

As @Athsrueas has highlighted, this aspect of curation could also ensure that a portion of the rewards is directed towards valuable projects that might otherwise struggle to sustain themselves financially.

In essence, establishing a role for curation that is profitable, inclusive, and adds value to the protocol can significantly contribute to nurturing communities, supporting public goods, and enhancing overall trust within the ecosystem.

What would good curation look like

Service Catalog

Find and Understand: As mentioned in the previous section. This could be addressed with a more powerful search function. A way to not only search the available services but also the various subcomponents of these services. For example, what if the explorer could allow user’s to search entities within subgraphs? Or when you pull up a subgraph, you should be able to see all the other subgraphs that index similar contracts and quickly scan how their schemas differ. There are plenty of effective data catalog designs out there.

Trust: Unfortunately, trust being hard to establish is baked into the human condition and I don’t think there are ways around it apart from approximating it through the track record of the service , through the reputation of its author / developer or some form benchmarking. From personal experience, when trying to analyze a protocol’s activity, I usually default to selecting the one written by Messari without even looking at the signal. In the end, brands carry expectations. And if we consider the history of review sites, depending too much on others for assessing quality can sometimes lead to bad situations.

Graph of Data

I find the idea crowdsourcing an open knowledge graph incredibly elegant. In a world where AI functionality becomes ubiquitous, well structured consensus knowledge will be crucial in order to align and ground the outputs generated by large language models.

This structured knowledge comes in two flavors including in the context of the graph.

  1. Knowledge Graphs: These are systems designed to capture and represent entities and their interrelations within a specific domain, typically housed within graph databases like Neo4j. Several actively maintained knowledge graphs exist today, such as the Google Knowledge Graph, WikiData, the Wolfram Knowledge Base, along with Geo and other specialized taxonomies and ontologies (though not all may be implemented as traditional knowledge graphs).
  2. Semantic Layers: While they share conceptual similarities with knowledge graphs, semantic layers serve as a framework for organizing data sources via their schemas, often built atop SQL databases. Their purpose is to act as a bridge between raw data storage and its contextual meaning within a business environment. This approach is more prevalent than knowledge graphs due to the widespread use of SQL. In The Graph’s ecosystem, this is partially reflected in the GraphQL schemas used for serving subgraph data, and becomes even more apparent with Substreams:SQL, where the output of substreams is modeled using DBT (data build tool).

Some emerging technologies, like Relational AI, are attempting to merge knowledge graphs and semantic layers into a unified system. However, these technologies are still in their infancy and will likely necessitate the development of new tech stacks.

Curation plays a critical role in both scenarios, as curators are tasked with constructing and organizing these knowledge graphs and semantic layers. The challenge, however, lies in devising an incentive model that effectively encourages crowdsourcing for these complex tasks while also maintaining overall coherence across time and domains.

Given the technical intricacies involved in structuring and semantically defining any domain—particularly with smart contracts, which often have unique edge cases as seen in Messari’s work—this type of endeavor might be best suited for specialized teams, such as ontologists for knowledge graphs and analytics engineers for semantic layers. The question then becomes: What incentive model could ensure stable employment and predictable income for these teams?

Take, for example, Golden, a knowledge graph initiative that sought to expedite its growth by crowdsourcing entries with on-chain incentives. Unfortunately, I’ve been unable to find recent updates on the project’s progress, and it appears to be inactive.

Community Building

I don’t have as much to elaborate on this point. It’s been wonderful to see various communities independently forming, sometimes with the support of the foundation, and they all appear to be thriving. I’m particularly familiar with The Graph Builders DAO, which has done an outstanding job (kudos to @DataNexus and the entire team, sorry I don’t know the different discord / forum handles). They’ve been instrumental in establishing best practices for writing substreams and subgraphs, training new developers, and bringing new projects on board.

In terms of fostering this type of community, a combination of grants, bounties, and salaries has proven to be highly effective. It has attracted individuals who agree on clear objectives, enabling them to collaborate and develop services that others find valuable. This, in turn, contributes to the overall sustainability of the network.

Critical Component

While I’m still in the process of fully understanding the Horizon proposal, my current grasp is that it significantly increases the adaptability of The Graph’s offerings. It does this by positioning GRT and staking as a “business medium” for data, which in turn enables various teams to provide unique and differentiated services.

Building on this idea, and reflecting on @Mr.1776 's remarks about Geo being a data service, it seems that many of the functions we associate with curation could be more effectively realized within the services offered on the network. This approach could keep the required onchain interactions to remain as simple as possible. For instance, as previously suggested, a data service could be established specifically to monitor and categorize other data services on the network. Such a service would not only stand out as a unique offering but would also encourage the development of specialized expertise to align data offerings with user needs.

This brings me to my concluding thought. I am wholly convinced that we can fulfill Yaniv’s vision of organizing the world’s public knowledge in a decentralized manner, and in doing so, create services that are as indispensable to users as Google or Wikipedia. However, I believe the key to achieving this lies in offering services that are compelling enough to be chosen over other alternatives. This can only be accomplished by focusing on the needs of users exactly where they are. By doing so, we would cater not just to developers, but also to business professionals, researchers, everyday consumers, and potentially even AI agents, by providing them with the data they need in a format that integrates effortlessly into their existing workflows.

I’m eager to hear everyone’s thoughts on the future of curation and please let me know if there’s anything I might have overlooked!