As part of our 10-year anniversary, we want to tell you the story of how DataCite was founded 10 years ago. Therefore, we approached several people ‘who were there’ to tell you their part of the story. This is the first guest blog post by Jan Brase, currently Head of Research and Development at Göttingen State and University Library, who was DataCite’s executive officer for its first 5 years.
The year 2019 sees the 10 year anniversary of DataCite and as someone who was there at the start, it is a great opportunity to look back on how it all began. Starting DataCite has been without a doubt a highlight and I am still extremely grateful that I had the opportunity to build up something that changed academic publishing. Looking back, I also believe that the story of DataCite is a great example how a simple idea could get global momentum, just because the time was right and people were enthusiastic.
The general idea of treating scientific data as independently published digital objects arose around the end of the last millennium. The developments in publishing, the Internet in general, and the use of the Digital Object Identifier (DOI®) to link article citations motivated the discussion on applying the methods of managing electronic publications to research data, or more generally, to digital entities.
In 2003, I had the opportunity to work on a project at the German National Library of Science and Technology (TIB) that was funded by the German Research Foundation (DFG). The goal of the project was to look at ways to establish an infrastructure to assign identifiers to scientific data sets to make them citable. In the beginning it was still an open question what type of identifier to use and the DOI was just one of the possibilities. The partners in the project were three major German data centers and from the beginning the simple idea of establishing citable data sets was appealing. An important part of the approach was the inclusion of libraries as registration agencies, thereby establishing a service open to all disciplines.
The members of the initial DFG project that was the foundation for DataCite in Hannover 2007: Jens Klump (GFZ Potsdam), Michael Lautenschlager (DKRZ), Michael Diepenbroek (Pangaea), Jan Brase (TIB), Beate Hildebrand (DLR), Uwe Schindler (Pangaea), Heinke Hoeck (DKRZ), Irina Sens (TIB), Hannes Grobe (AWI Bremen)
In 2004 TIB became an official DOI Foundation Registration Agency and the first dataset was registered and included in TIB’s library catalog. The item is still available, the DOI is 10.1594/WDCC/EH4_OPYC_SRES_A2.
The next years were exciting, as we had to build up an infrastructure and define standards, schemata, and workflows. Many of these are still relevant today and used globally, but in those years, some decisions were just wild guesses. Some things were improvised: the registration infrastructure that was used from 2004 until 2010 was originally the outcome of a bachelor thesis of a student of mine. A prototype, that successfully registered millions of DOI names from all over the world before DataCite had resources for a more sophisticated technical approach.
What most of you might not believe or recall now is that at that time, ‘data’ was seen as an uninteresting topic. I still remember that after every presentation on our project (and I presented frequently), there were always two comments: ‘Why do you care about data? There is nothing interesting in data’ and secondly: ‘Why do you use DOIs? DOIs belong to publishers and are evil!’ Looking back this is one of the most fascinating lessons: How we held onto the topic and the DOI as identifier against all odds and sceptical comments from every direction, with the result that now, 15 years later the usage of DOIs for data is globally accepted and a new generation of scientists grows up for whom it is normal to have citable datasets.
Nevertheless, many scientists, even then, understood the benefits of having citable datasets, so we gained more and more data centers as customers and the number of registered datasets grew steadily. The TIB included data registration as an official service in their portfolio and I became head of the DOI registration agency at TIB. But with our growing success came new challenges: as science is global, the service of assigning DOIs to datasets can not merely be a national service. However, when some foreign data centers showed interest in our service, they could not get funds to pay a German library for a service. So it was clear that we needed a broader approach, especially as during my visits to other European libraries, those libraries expressed their interest in offering their own DOI registration services. The idea was born to form an international alliance and in 2007 the partners of the consortium TechLib (Tu Delft Library, Library of the ETH Zürich in Switzerland and Technical Information Center of Denmark (DTIC)) expressed their general interest to be partners with TIB. But it was through two global organisations that these ideas suddenly became a reality.
The first one was the International Council of Science and Technology (ICSTI). In the summer of 2008, we were invited by ICSTI to attend their global conference in Seoul, Korea and present our DOI service for data. I had the chance to present our ideas of building a global DOI service to senior representatives from science libraries. OSTI from the US, CISTI from Canada, INIST from France and the British Library expressed their interest in joining and it was especially the British Library that really sped up the process. I still remember Adam Farquhar telling me at a meeting in London in December 2008: “We want to be part of this and we want to do it right”. So in March 2009 at a workshop on data citation jointly organized by ICSTI and CODATA in Paris, six libraries signed a Memorandum of Understanding to “establish a not-for-profit agency that enables organizations to register research datasets and assign persistent identifiers”. We were on track.
Signing the MoU to create an international DOI agency for datasets March 2009 in Paris: Herbert Gruttemeier (INIST), Adam Farquhar (BL), Maria Heijne (TU Delft), Jan Brase (TIB), Wolfram Neubauer (ETH Zürich), Mogens Sandfaer (DTIC)
The second organisation was the Coalition of Network Information (CNI), as the German DFG decided to send me to the CNI spring meeting to present our project on citable data sets. At this meeting, I approached the California Digital Library, the Purdue university library and the Australian National Data service and they expressed interest in joining such a consortium.
On December 1st 2009 DataCite was founded in London. The rest is history.
Looking back, it is still remarkable how all of us came together and how we all had the feeling that this was the right thing to do.
Although it started small and Salvatore Mele from CERN said during that time “DataCite is Jan and his laptop”, it would not have been possible without the vision of some German data scientists, the willingness of the DFG to fund the first steps, the generosity of TIB to try something new (and allow me to work up to 80% of my time from anywhere in the world but a TIB office).
This is my version of what happened 10 years ago. I am looking forward to the other blog posts looking back at 2009 and hope to see many of you at the official birthday party in April in Philadelphia.
The first board of DataCite on December 1st, 2009: Alfred Heller (DTIC), Pam Bjornson (CISTI), Uwe Rosemann (TIB), President Adam Farquhar (BL), Patricia Cruse (CDL) and Jan Brase as the managing agent.
Representatives of the initial international partners in June 2009 in Germany at a meeting to establish the DataCite governance: Frits van Latum (TU Delft), Jeroen Rombouts (TU Delft), Alfred Heller (DTIC), Wolfram Neubauer (ETH Zürich), Jan Brase (TIB), Irina Sens (TIB), Herbert Gruttemeier (INIST), Karen Morgenroth (CISTI), Roland Lambert (INIST), Elizabeth Newbold (BL) Adam Farquhar (BL)