Introducing DataCite Metadata Schema 4.5

https://doi.org/10.5438/jvkk-8198

At DataCite, we make metadata for over 50 million DOIs openly available to enable the discovery of research outputs and related resources. This metadata is created by DataCite’s Members and Consortium Organizations according to the DataCite Metadata Schema. When registering a DataCite DOI, repositories submit metadata to improve discoverability and reusability.

Today, we are releasing version 4.5 of the DataCite Metadata Schema. 

Developed by the Metadata Working Group in coordination with DataCite staff, this release includes several changes to help repositories create richer and more accurate metadata. Full details are available in the version update notes within the documentation. In addition to these changes, we’ve refreshed the schema documentation format to make it easier to navigate.

Support for Instruments

Schema 4.5 adds support for instruments to the DataCite Metadata Schema. Instrument is now an option for resourceTypeGeneral to support repositories with consistently identifying and referencing instruments. Connections between instruments and the data they collect can be indicated using RelatedIdentifiers with the new relationTypes IsCollectedBy and Collects. For guidance with creating metadata for instruments, the PIDINST Schema Mapping is included in the documentation.

Support for Pre-registrations and Registered Reports

Pre-registrations and registered reports are important for research reproducibility. Registering a study lends accountability and transparency in the hypothesis-generating and testing process. To support this growing practice, we have added a new term to resourceTypeGeneral—StudyRegistration—to help with identifying registrations. The specific type of study registration can be specified in the free text resourceType field.

Identifiers for Publishers

The DataCite Metadata Schema supports connection metadata to associate DOIs with related works, people, and organizations. In Schema 4.5, the Publisher property has been updated to support identifiers for publishers. Repositories can now optionally specify a publisherIdentifier to unambiguously identify the publisher of a resource. Similar to affiliationIdentifier (introduced in Schema 4.3), publisherIdentifier can support any identifier scheme. We recommend using a ROR ID for an organization that is a publisher.

Definition Updates and New Guidance

With each schema version, we make updates to the documentation to clarify definitions and guidance. Among other changes in Schema 4.5, we updated the definition for the resourceTypeGeneral PhysicalObject to be more inclusive of samples. We also clarified the RelatedItem sub-property definitions and added a new guidance page for RelatedItem: Using RelatedItem for publication information and related resources. A full list of documentation changes is included in the version update notes.

Updated Documentation Format

To date, we have always published the metadata schema documentation as a PDF document. As of Schema 4.4, this document was 82 pages! To make it easier to use the schema documentation on the web, we’ve migrated to Read the Docs for Schema 4.5. 

Here’s what you need to know about this change:

  • The schema website (https://schema.datacite.org) remains the official hub for DataCite Metadata Schema documentation and resources.
  • From the schema website, you’ll find the link to the documentation on Read the Docs from the Schema 4.5 page—click on “Access documentation”.
  • You can also export the documentation as a PDF file. Because the PDF is generated by Sphinx (the documentation generator powering Read the Docs), you’ll notice some formatting changes relative to past versions, including hyperlinks between sections of the PDF file.

This change will also make it easier for DataCite to make small corrections to the documentation in between versions. Note that the PDF file export contains the date that the documentation was last updated—so if we find any typos, you’ll notice the date at the top will be after the Schema 4.5 release date.

How Repositories can Start Using Schema 4.5

The DataCite community can start using Schema 4.5 today! The following changes are supported across the REST API, MDS API, and Fabrica for creating and updating DOIs:

  1. When registering a DOI, you can use the new terms added to resourceTypeGeneral: Instrument and StudyRegistration. You can also use these terms in RelatedIdentifier and RelatedItem when indicating the type of the related resource.
  2. The new relationType options, IsCollectedBy and Collects, are supported for both the RelatedIdentifier and RelatedItem properties.
  3. Publishers can now have identifiers with the publisherIdentifier, publisherIdentifierScheme, and schemeUri attributes. Read more about this change in our support documentation on Schema 4.5 Publisher Changes.

If you’re submitting JSON via the REST API, metadata is automatically created using the latest version 4.x, which is now 4.5. 

For Fabrica users, the Schema 4.5 changes have been made available via the Create DOI form. When adding a Publisher identifier via the form, the suggested values will be from ROR.

For those submitting XML metadata, including MDS API users, we recommend specifying kernel-4 to automatically use the latest backward-compatible minor version. If you’re specifying a minor version (i.e., kernel-4.4), you’ll need to update this to either kernel-4.5 or kernel-4 to access the latest changes. This also applies when submitting XML via the REST API or Fabrica File Upload.

All of our DOI registration methods also support metadata updates. Repositories can update metadata for previously registered DOIs at any time. For example, repositories that have registered DOIs for instruments using the resourceTypeGeneral=”Other” can now update these to resourceTypeGeneral=”Instrument”. Making this change improves discoverability for Instrument DOIs and improves the overall quality of DataCite metadata.

For repositories using Schema 3, it’s important to begin using Schema 4 (which includes minor versions 4.0 through 4.5) before the upcoming deprecation of Schema 3 on January 1, 2025. Schema 4. Guidance on how to make the switch to Schema 4 is available on our support site, and we encourage you to contact support@datacite.org for assistance.

Retrieving Schema 4.5 Metadata via DataCite Services

Adding publisher identifiers necessitated some changes to our APIs and services which return JSON. Previously, “publisher” was a string value containing the publisher name. The new structure looks like this:

"publisher": 
  {  
    "name": "DataCite",  
    "schemeUri": "https://ror.org",  
    "publisherIdentifier": "https://ror.org/04wxnsj81",  
    "publisherIdentifierScheme": "ROR",
    "lang": "en"
  }
 

For metadata retrieval via the REST API, a URL parameter “publisher” is available to retrieve publisher identifiers. 

DataCite Content Negotiation and the DataCite GraphQL API will be updated to support the new publisher structure on February 27, 2024. Breaking changes will apply to Content Negotiation requests for DataCite JSON and GraphQL API queries for publisher values only. For more information on what will change, refer to our support documentation on Schema 4.5 Publisher Changes.

What’s next

The DataCite team and the Metadata Working Group are working on upcoming schema versions. In the coming months, we’ll share some draft changes for community feedback as a Request for Comments (RFC). Our release schedule for specific changes in this RFC will depend on complexity and backward compatibility. For example, any major changes would form part of Schema 5, while small backward-compatible changes may be released earlier in Schema 4.6. We invite you to suggest changes to the DataCite Metadata Schema at any time for consideration.

If you have any questions or feedback about the Schema 4.5 updates, please get in touch with us at support@datacite.org.

To learn more about the schema changes, please also consider joining the upcoming webinar “Updating the DataCite Metadata Schema: Introducing Schema 4.5 and deprecating Schema 3” on March 13, 2024, 3pm (UTC).

Photo of Kelly Stathis
Kelly Stathis
Technical Community Manager at DataCite | Blog posts
Cody Ross
Application Support Engineer at DataCite | Blog posts
Suzanne Vogt
Application Developer at DataCite | Blog posts
Kudakwashe Siziva
Kudakwashe Siziva
Application Developer at DataCite | Blog posts