Recap: Persistent Identifiers in Paris

In late September DataCite and ePIC co-hosted a conference, Persistent Identifiers: Enabling Services for Data Intensive Research, in Paris on the Monday before the RDA Sixth Plenary meeting. It was a great way to kick-off a busy week of data conversations and most appropriate to start with persistent identifiers – after all shouldn’t everything begin with persistent identifiers? All of the presentations from the DataCite / EPIC event are now available collectively and linked below individually.

In addition, we gathered questions from meeting participants via a Google doc and Twitter. Our speakers have taken the time to answer questions and those are also available. Last week we shared with you a short summary of the tweets, all using the hashtag #pid_paris. Here are a handful of the highlights from the day.

The day by the numbers:

  • 140 Attendees
  • 18 countries represented
  • 13 speakers
  • 195 minutes of presentation time
  • 144 PID_Paris tweets
  • 1 reception with gorgeous views (while sipping champagne)

Our first session of the day focused on Persistent Identifiers for Interoperability and Services, which provided the opportunity for leading experts to share their knowledge on many foundational persistent identifier initiatives.

Geoff Bilder, CrossRef and Martin Fenner, DataCite, joined forces to “bust some DOI myths” and used the help of unicorns to make their point. Myth busting ranged from CrossRef and DataCite’s business models, costs, types of content and more. Bilder and Fenner stressed that it is important to think of a DOI as combination of a persistent identifier, metadata, and a social contract in order to make it all work. Presentation: DOI Myths… busted [@]

Ulrich Schwardmann, ePIC, shared the ePic approach of making all data in the research lifecycle citatble. ePIC’s main objective is to enable data sharing. However, this is challenging in a data intensive research environment as the automatic processing of data becomes a necessity. Presentation: ePIC – Persistent Identifiers for eResearch [@]

Next up was Larry Lannom, DONA, shared information on DONA – a recently formed international organization for managing the handle system. One of the goals of the DONA foundation is to decentralize the governance of the Global Handle Repository. Presentation: DONA Foundation, Administering the Global Handle Registry (GHR) [@]

John Kunze, California Digital Library, shared his thoughts on the importance of open identifier structures in his talk aptly titled “Names, Things, and Open identifiers infrastructure: N2T and ARKs”. The ARK infrastructure is deeply decentralized and can serve as a model for the community. Presentation: Names, Things, and Open Identifier Infrastructure: N2T and ARKs [@]

Next up, Laura Paglione, ORCID, showcased a range of existing and new ORCID services designed to distinguish researchers. Currently there are 1.6 million ORCIDs (and growing) while 65% of those are created in the process submitting manuscripts, grants, etc. One of ORCID’s new services includes support for researchers to describe their peer review activities while preserving anonymity when necessary. Presentation: Connecting people to their scholarly activity and outputs [@]

Anila Angjeli, ISNI, was the last speaker from the first session and spoke about ISNI’s work in managing identities. ISNI focuses on public identities for individuals and organizations and strives to be agnostic of domains, disciplines, roles, etc. Currently 8.46 million entities have been identified by ISNI – among which 2.55 million are researchers and 525,600 are organizations – impressive work. Presentation: Managing identities: Interconnecting research and other domains [@]

The second session of the day, Supporting Data Intensive Research, included six speakers discussing how they are using persistent identifiers in day-to-day operations to support researchers – it was great to see the steady and real progress the persistent identifier community has made in delivering persistent identifier based services to the research community.

Tobias Weigel, DKRZ, started us off with his talk about infrastructure to support data replication, versioning and collections. Tobias discussed the need for differentiated and clearly communicated quality levels for PIDs and stressed that whenever a PID exists in the wild we should not remove it even if the object has been removed. Presentation: PID usage at DKRZ, the role of RDA and ePIC policies [@]

Ann Cambon-Thomsen, BRIF, spoke on the identification of bio-resources and shared information on CoBRA – guidelines to standardize the citation of bio-resources in journal articles. Presentation: BRIF: Bioresource research impact factor (framework) [@]

Next up Kerstin Lehnert, IGSN, presented on the challenges associated with uniquely identifying samples taken from our natural environment. One particular challenge is how to develop a core metadata schema, with a controlled, but flexible vocabulary. Presentation: IGSN: International Geo Sample Number. Unambiguous Citation of Physical Samples [@]

Peter Wittenburg spoke about EUDAT and PIDs and noted that data are very far from being considered published. Peter also spoke about the need for more harmonization and expressed a distinction between descriptive metadata stored with the object and that stored as part of the PID record. (no slides)

Sünje Dallmeier-Tiessen, THOR, shared information on THOR, the Horizon2020 project, which aims to provide seamless integration between articles, data and researchers across the research lifecycle. Technical and Human Infrastructure of Human Research (THOR) brings together data centers, publishers, and infrastructure service providers (DataCite and ORCID) to work on the harmonization of PIDs infrastructures thereby creating interoperability – the “triangle of data, author and paper”. Presentation: Enabling services for data intensive research with THOR [@]

Jennifer Lin, Making Data Count (MDC), was our last speaker of the day and Jennifer shared interesting work the NSF funded data level metrics project called Making Data Count. Jennifer shared that citations to data are considered to be the most valued metric for researchers, but citations suffer from many technical and cultural obstacles. The MDC Project is addressing many of these key challenges. Presentation: Making Data Count [@]

After the formal session the meeting participants walked down the hill for a reception at La tour Zamansky. After looking at the spectacular views and a having a few sips of bubbly we all dispersed with our heads full from a day of great conversations on persistent identifiers.

Paris view
Paris view


Trisha Cruse
Executive Director at DataCite | Blog posts