Wikidata:SPARQL query service/WDQS graph split
Wikidata contains a lot of data. It has grown to a size that a single Wikidata Query Service instance can no longer handle it together with the amount of edits and queries it gets. In order to stabilize the Wikidata Query Service it is being split into two distinct query services. This page explains this graph split and what it means for you.
The details of the split
[edit]query.wikidata.org used to contain all the data that is in Wikidata. Going forward this is no longer the case. The full graph is split into two distinct graphs: the main graph and the scholarly graph. The split happened on 9 May 2025.
For the scholarly graph a second query service is running at https://query-scholarly.wikidata.org/. It contains the data from entities that match any of the following criteria:
- it has a instance of (P31):scholarly article (Q13442814) statement and the statement is not deprecated (see the full list of instance of (P31) taken into account here: Wikidata:SPARQL_query_service/WDQS_graph_split/Rules#Scholarly_Articles)
- it has a publication type of scholarly work (P13046) statement and the statement is not deprecated
The main graph is on query.wikidata and contains the data from all entities that do not match the above criteria.
For more information about the rules of the split see Wikidata:SPARQL query service/WDQS graph split/Rules.
Endpoints
[edit]These different Wikidata SPARQL query services and endpoints are available:
- https://query.wikidata.org/, with only the "main" graph,
- https://query-scholarly.wikidata.org , with only the "scholarly" graph
- https://query-legacy-full.wikidata.org/, a legacy endpoint, unsplit, scheduled to run until December 2025 (limited availability - should only be used if absolutely necessary).
What this means for your queries
[edit]Queries which do not touch data from entities in the scholarly graph will continue to function as before the graph split.
Queries that require data from both the main graph and the scholarly article graph will need to use SPARQL's federation feature to combine data from both endpoints in their result.
A transitionary query service containing all data is available at query-legacy-full.wikidata.org until December 2025. It is however limited in resources and should only be used for tools that absolutely require the extended migration time. Additionally there are a number of alternative endpoints that you can use. They are run by other organisations and still hold the full graph.
The reasons behind the split
[edit]The underlying issue that we are addressing is the medium term scalability and stability of Wikidata Query Service, which can hinder access to and the possibility to query the data in Wikidata. The Wikidata Query Service runs on top of Blazegraph, and comprises over 16 billion triples. The graph is currently growing at the rate of 1 billion triples per year.
With the current size and growth of the graph, we are experiencing a number of scalability issues:
- The reloading (rebuilding) of the graph from the Wikidata dumps takes between 1 and 2 months, sometimes more. In part this depends on how long the operation is, in part the reloading time is extended because it can unpredictably crash once the graph reaches a certain size, requiring the process to be restarted
- More frequent stability issues with WDQS
- The queries are taking a longer time to run, with more frequent timeouts
The ability to reload the graph is a critical function in order to ensure data consistency and be able to recover from potential critical data issues. It is an indication of the stability and scalability of the system. Furthermore, the instability of the data reload process is directly linked to the size of the graph, in a similar way that the runtime stability of WDQS is linked to the size of the graph.
FAQs
[edit]- Does this affect the data in Wikidata itself?
No. the graph split will only split the current query graph, moving the scholarly articles to a separate graph in Blazegraph. It does not change how the data is accessed or edited on the Wikidata website and other APIs.
- What if I need help to rewrite a query that doesn't work anymore following the split?
You can get help with your query at Wikidata:Request a query or run it on one of the alternative endpoints that holds the full graph.
- Where can I learn more about SPARQL federation?
The Internal federation guide has more details for you.
- How can I find out if a tool I rely on has been adapted to the graph split?
Please see Wikidata:SPARQL query service/WDQS graph split/Affected tools.
- Why did you decide on these criteria for the split?
Please see Wikidata:SPARQL query service/WDQS graph split/WDQS Split Refinement.
- Where should feedback be sent or questions asked?
You can use the talk page, write on Wikidata:Report a technical problem/WDQS and Search or join one of the next Search Platform Team or Wikidata office hours.