What is it?

Currently, the most widely applied practice, to publish data is alongside a research publication. The number of journals collaborating with data centers is increasing (e.g. Nature, The American Naturalist, Molecular Ecology, PNAS, PLOS), and the publication of associated data is becoming a common requirement. This is because data are acknowledged being a fundamental part of the research process and as important as discussions and conclusions derived from them. Data become publicly available, citable and uniquely identifiable via a persistent identifier (PID). Furthermore, the publication of data sets promotes transparency in the research life cycle, facilitates the verification and reproducibility of results and very likely increases your citation rate. Therefore, it is important to foster a practice, like geneticists already established with ‘Genbank’ (since 1982) which allows data sharing.
Another possibility for publishing data is to write a “data paper” describing your data and their potential usage in detail, and publish it in a designated data journal (examples of data journals: Biodiversity Data Journal, Genomics Data, Earth System Science Data). If you intend to publish data that are not connected to a research or data journal publication you can do so by publishing them directly via a data center.


How to do it?

  1. Generally, prefer journals and data centers, which promote data publication and encourage other data authors to cite data and to make their own data available for reuse.
  2. Prefer data centers and archives that provide a persistent identifier (e.g. DOI) to your data set when you submit them. Supplements to your research publication are no alternative because they are less visible and hard to cite because URLs might move.  
  3. When you publish your data, it does not imply that they are freely available from the beginning! You can decide if you want to impose an embargo for a pre-defined time on your data, before or even after academic publication. Only the metadata are visible during this time. (see also Fact-Sheet ‘Submit’)
  4. Specify in your publication where your data set can be discovered, accessed and visualized. Cross-reference the research article with the persistent identifier of the data set and the archived data set with the persistent identifier of your research paper.
  5. When you re-use a subset of other authors’ data, the reference to the associated persistent identifier may still not be precise enough. Use the finest-grained level of citation and inform the data center so that it can also link your publication to this data set.  
  6. Update your archived data sets as soon as newer versions are available.

Who does it?

Data should be published by all researchers producing data.

Key Elements

  • Get a persistent identifier (e.g. DOI) on your submitted dataset.
  • Impose an embargo period if necessary.
  • Request a link between academic publication and used data sets.
  • Cite properly.
  • Update your archived data sets.

GFBio Services

Data Publication

  • You get a persistent identifier (e.g. DOI) for your dataset when you submit it via GFBio to one of our associated data centers.
  • You can link your submitted data to associated projects, research publications and further datasets to increase its visibility.

Useful Links (Data Citation and Linking)

Analyze ← → Plan


Recommended citation:
German Federation for Biological Data (2021). GFBio Training Materials: Data Life Cycle Fact-Sheet: Data Life Cycle: Publish. Retrieved 16 Dec 2021 from