Announcing DataCite’s First Public Data File

Today, we are releasing DataCite’s first public data file with metadata for over 52 million DOIs. 

DataCite DOI metadata has always been openly available. In line with our commitment to the POSI principles, we make all metadata registered with DataCite part of the public domain through a CC0 copyright waiver. With our metadata retrieval services—the DataCite REST API, OAI-PMH service, and GraphQL API—anyone can retrieve DataCite DOI metadata to enable discovery, promote reuse, and understand the research landscape.

As the number of DataCite DOIs continues to increase, harvesting the complete set of records via our existing tools inevitably takes longer than it once did. Compared with using our APIs, downloading the data file is a faster way to retrieve DataCite DOI metadata: instead of requesting the list of DOIs page by page, users can now download a single (compressed) file in one go.

The public data file contains metadata for all DataCite DOIs. Specifically, this first release contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023. Each DOI has descriptive metadata for research outputs and resources structured according to the DataCite Metadata Schema. Many of these records include links to other persistent identifiers (PIDs) for works (DOIs), people (ORCID iDs), and organizations (ROR IDs). Read more about the details of the data file format and structure in our support documentation. Going forward, we plan to release a complete public data file on an annual basis.

Part of the development work around the data file was made possible through support from the FAIRCORE4EOSC project. FAIRCORE4EOSC has received funding from the EU’s Horizon Europe research and innovation programme under Grant Agreement no. 101057264. The DataCite team would like to extend a special thank you to the partners on the FAIRCORE4EOSC project, who provided invaluable feedback during the development of the data file.

From metadata harvesters, to research institutions, to bibliometricians, everyone is welcome to use the DataCite public data file. To provide access, we are piloting a new portal where you can request a link to download the public data file directly: 

As this is our first release, we are eager to get feedback from early users to help guide the next phases of the work. We are also exploring a premium harvesting service to provide more robust access to the data file and enable customized snapshots. Please let us know what you are doing (or plan to do) with the data file, what is working well, what you need help with, and what you would like to see in the future by reaching out to us at