GFBio Data Centers

The ten Data Centers listed below are are infrastructure partners in the GFBio broker network. They are are departments of science informatics and data infrastructure at recognized science institutions devoted to manage, store, archive and publish various types of bio- and geodiversity data. Data submitted through GFBio are transmitted to and curated by data curators at a matching GFBio Data Center, based on the profiles below.

The seven Collection Data Centers within GFBio regularly deliver standardised species occurence data to the GFBio Data Portal. They also act as GBIF data publishers and most of them are part of the German GBIF node system. They support data producers in Germany with publishing data in the Global Biodiversity Information Facility (GBIF).

Data Centers specialized on Nucleotide, Plant and Environmental Data

The Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben and the German Plant Phenotyping Network (DPPN) have jointly initiated the Plant Genomics and Phenomics Research Data Repository (e!DAL-PGP) as an infrastructure to publish plant research data. e!DAL-PGP provides access to cross-domain, plant-related research data that exceeds existing repositories due to their size or scope. e!DAL-PGP is registered as research data repository at BioSharing.org, re3data.org and OpenAIRE as valid EU Horizon 2020 open data archive.

Contact in GFBio context: Dr. Daniel Arend, Dr. Uwe Scholz, Dr. Matthias Lange  Contact e!DAL-PGP

Extended Profile e!DAL-PGP

Service Description

A desktop-application is available to upload large datasets to e!DAL-PGP. For smaller publications, also an intuitive web interface is provided. The user authentication features ELIXIR AAI, GOOGLE or ORCID accounts. Accepted data domains are among others image collections from plant phenotyping, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents. All datasets must be supplied by technical metadata and reviewed by two scientific and one administrative reviewer for (meta-)data quality and reusability. The user is guided by e-mails through the review process. Data submitters can define an optional embargo date. Every published dataset is referenced with a DOI to guarantee a FAIR-aware and long-term stable citation.

Citation: Arend et al. - PGP repository: a plant phenomics and genomics data publication infrastructure. Database. 2016. https://doi.org/10.1093/database/baw033

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes

Data Submission Formats

Data

No limitations, almost any file format is accepted

Metadata

No limitations, all (standardized or other) formats are accepted

Data Accessibility

Public access points

GFBio, PGP Repository Content Page of citable DOIs, DataCite Search

Long-term availability

Minimum 10 years

Data Publication Services

Data Citation DOI
Yes Yes

The European Nucleotide Archive provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation as well as metadata (sample description, experimental setup) and interpreted information (annotations). ENA is developed and operated by the EMBL-European Bioinformatics Institute (EMBL-EBI), an academic research institute based in the UK and part of the  European Molecular Biology Laboratory (EMBL). ENA is one of the three databases that make up the International Nucleotide Sequence Database Collaboration (INSDC).

Contact in GFBio context: Dr. Frank Oliver Glöckner, Dr. Ivaylo Kostadinov Contact ENA

Extended Profile ENA

Service Description

The GFBio Brokerage Service provides the timely, standards-compliant deposition of all molecular sequence data into the public repositories of the INSDC. The key components of the service include: (a) Support for metadata standardization, curation and quality control, (b) negotiation of embargo periods, including communication with INSDC, (c) parallel submission of environmental metadata to PANGAEA, (d) cross-linking sequence data and environmental data (PANGAEA) via accession number and DOI.

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes

Data Submission Formats

Data

Sequence data has to be in one of the formats supported by ENA.

Metadata

Molecular sequence metadata should be compliant with the standards of the “Minimum information about any (x) gene sequence” ( MIxS). It can be put in manually, uploaded in a GCDJ/JSON format or as a tab-separated TSV. Appropriate templates are available for all formats.

Data Accessibility

Public access points

GFBio, ENA, data is exchanged daily with members of the INSDC Consortium: DNA Data Bank of Japan (DDBJ) and National Center for Biotechnology Information (NCBI)

Long-term availability

Unlimited

Data Publication Services

Data Citation DOI
No No

ENA issues Accession Numbers, which are to be included as citations in publications, which use the respective datasets. Accession numbers are available for different granularity levels (e.g. study/datasets, samples, etc.). ENA recommends citing the study Accession Number (i.e. dataset identifier) throughout the text of the publication, more details under Citing ENA Data.

The Data Publisher for Earth & Environmental Science is a globally leading information system, long term archive and data publisher for spatial geoscientific, biological and environmental data. Data published by PANGAEA origins from a broad range of subdisciplines from earth system research such as biological sciences, chemistry, physics with a special focus an earth sciences and environmental sciences. Jointly hosted by the Centre for Marine Environmental Sciences (MARUM) at the University Bremen and the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research (AWI), PANGAEA is laid out as a permanent facility, guaranteeing the long-term availability and accessibility of archived data and metadata in secure and machine readable formats. It is also a World Data Center (WDC-PANGAEA) and accredited by ICSU World Data System.

Contact in GFBio context: Dr. Michael Diepenbroek, Dr. Robert Huber, Dr. Janine Felden Contact PANGAEA

Extended Profile PANGAEA

Service Description

The GFBio Brokerage Service provides the timely, standards-compliant deposition of all molecular sequence data into the public repositories of the INSDC. The key components of the service include: (a) Support for metadata standardization, curation and quality control, (b) negotiation of embargo periods, including communication with INSDC, (c) parallel submission of environmental metadata to PANGAEA, (d) cross-linking sequence data and environmental data (PANGAEA) via accession number and DOI.

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes

Data Submission Formats

Data

Preferably spreadsheets (CSV), databases, binary files, almost any file format is accepted

Metadata

All (standardized or other) formats are accepted

Data Accessibility

Public access points

GFBio, PANGAEA, Institutional landing pages of citable stable URIs, GBIF,  OBIS, GEOSS and others

Standardised exchange formats

INSPIRE, ISO 19115, Darwin Core, Dublin Core

Data formats

ASCII, EXCEL, Darwin Core

Long-term availability

Unlimited, certified (WDS & CTS) long term archive

Data Publication Services

Data Citation DOI
Yes Yes

Data Centers at Natural Science Collections

The Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin with its Research Group Biodiversity Informatics is a centre of biodiversity research in Europe, housing extensive scientific collections of herbarium specimens (about 3.5 million), one of the world's largest living plants collections, a DNA Bank as well as the most complete botanical library in Germany.

Contact in GFBio context: Anton Güntsch, David Fichtmüller and Katja Luther Contact the BGBM GFBio team

Extended profile BGBM

Service Description

Data archiving for research projects is focusing on

  • Type 1a Botanical specimen data; (Physical) DNA Storage (under discussion at BGBM); Referenced multimedia objects
  • Type 1b Botanical observational data; Referenced multimedia objects
  • Type 2 Botanical systematics and monographic works
  • Type 4 RAW data (data sets and/or data packages) only if well documented and in formats and structures appropriate for long-term archiving, without further data management required

A preference will be given on data which fall under the geographic and taxonomic research foci of the BGBM (BGBM Extended Profile, BGBM: Research). The data archiving and publication includes management processes with JACQ , reBiND -workflow and the EDIT Platform for Cybertaxonomy as well as the data quality service platform and transformation and import services provided by BGBM.

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes Yes

Data Submission Formats

Data

Preferably via BGBM collection data form for Botanical collections, DNA sample collections and/or Tissue collections which can be found in the GFBio collection of recommended data submission templates and/or standardized formats of any kind (e.g. ABCD or DwC-A files); Spreadsheets (CSV, excel-files, image files); Export files from external EDIT platform installations

Metadata

EML, ABCD, DarwinCore, DublinCore, SDD

Data Accessibility

Public access points

GFBio, BioCASe Data Access Services at the BGBM, Institutional landing pages of citable stable URIs, BiNHum, BioCASE, GBIF Europeana and others

Standardised exchange formats

XML-files in ABCD, DarwinCore, EML, SDD standard; Web services of the EDIT Platform for Cybertaxonomy

Data formats

TXT, CSV, XML

Long-term availability

Unlimited

Data Publication Services

Data Citation DOI
Yes Yes
via GBIF publication
via ZB MED/DataCite publication

The Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Braunschweig with Database and IT-Department is one of the largest biological resource centers worldwide. Its culture and tissue collections and DNA Bank currently comprise almost 40,000 items, including about 20,000 different bacterial and 5,000 fungal strains, 700 human and animal cell lines, 800 plant cell lines, 1,000 plant viruses and antisera, and 4,800 different types of bacterial genomic DNA.

Contact in GFBio context: Prof. Dr. Jörg Overmann and Christian Ebeling Contact the DSMZ GFBio team

Extended profile DSMZ

Service Description

Data archiving for research projects is focusing on

  • Type 1 Data accompanied by the deposit of a biological resource within the DSMZ collections ( Deposit in the DSMZ)
  • Type 2 Data describing microbial diversity (e.g. taxonomic classification, morphology, physiology, cultivation, origin, natural habitat), according to our profile description (DSMZ Extended Profile, DSMZ)

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes Yes

Data Submission Formats

Data

Via DSMZ online accession form or  spreadsheets (CSV), preferably standardized formats of any kind, MySQL Dumps
Example templates for data submission can be found in the GFBio collection of recommended data submission templates

Metadata

EML, ABCD, DarwinCore, MCL

Data Accessibility

Public access points

GFBio, BioCASe Data Access Services at the DSMZ, GBIF, BacDive

Standardised exchange formats

XML-files in ABCD, DarwinCore, EML standard. Web services of the DSMZ accession platform and BacDive platform

Data formats

Text, CSV, XML

Long-term availability

Unlimited

Data Publication Services

Data Citation DOI
Yes Yes
via GBIF publication

The Leibniz Institute for Research on Evolution and Biodiversity, Berlin is a research museum within the Leibniz Association. It is one of the most significant research museums worldwide focusing on biodiversity, evolution and geo-sciences. The zoological, paleontological and mineralogicalcollections of the Museum are directly linked to Research and comprise more than 30 million items. In addition, the Museum has an Animal Sounds Archive containing approximately 120,000 animal sound recordings and a DNA Bank. The Library of the MfN is one of the most important reference libraries in zoology in the German-speaking world. Research at the Leibniz Institute for Research on Evolution and Biodiversity is organised in four Science Programmes ("Forschungsbereiche"): Evolution and Geoprocesses, Collection Development and Biodiversity Discovery, Digital World and Information Science, Public Engagement with Science.

Contact in GFBio context: Dr. Sabine von Mering and Falko Glöckler Contact the MfN GFBio team

Extended profile MfN

Service Description

Data archiving for research projects is focusing on

  • Type 1 Occurrence data and associated media originating from zoological, environmental and paleontological studies.
  • Type 2 A data type driven focus (independent from scientific domain) will be on archiving multi-media objects, traits, taxonomic and observation data, e.g. originating from Citizen Science initiatives (MfN Extended Profile).

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes Yes

Data Submission Formats

Data

Pre-structured, delimited plain-text files (CSV) and spreadsheet files (Microsoft Excel, Open Document Spreadsheet); SQL dump files; Binary files (e.g. images, audio, volume data); All submissions preferably in open formats
Example templates for data submission can be found in the GFBio collection of recommended data submission templates

Metadata

EML, ABCD, DarwinCore

Data Accessibility

Public access points

GFBio, BioCASe Data Access Services at the MfN, Institutional landing pages of citable stable URIs,  BioCASe, GBIF, BiNHum, GeoCASE, Europeana

Standardised exchange formats

XML-files in ABCD, DarwinCore, EML standard

Data formats

Text, CSV, XML

Long-term availability

Unlimited (minimum guaranteed time period of 15 years)

Data Publication Services

Data Citation DOI
Yes Yes

The Senckenberg Gesellschaft für Naturforschung (SGN) conducts research in bio- and geosciences within six research institutes and three natural history museums in Germany. The mission of the SGN is to make science and scientific findings accessible to the public through teaching, publishing, museums and special exhibitions in Frankfurt, Dresden, Görlitz and Tübingen. Senckenberg's research activity is divided into four large research fields: Biodiversity, Systematics and Evolution, Biodiversity and Environment, Biodiversity and Climate & Biodiversity and Earth System Dynamics.
With about 40 million objects/items in currently more than 200 collections the SGN has one of the largest scientific collections in Germany. The objects involve a herbarium, zoological, anthropological, paleontological and mineralogical collections plus a DNA Bank. SGN is part of the Leibniz association.

Contact in GFBio context: Anke Penzlin Contact the SGN GFBio team

Extended profile SGN

Service Description

Data archiving for research projects is focusing on botanical, zoological and anthropological data, according to our profile description (SGN Extended Profile).

  • Type 1a Collection data, together with the deposit of physical objects, referenced multimedia objects.
  • Type 1b [submission of this data type is currently not possible] Observation and occurrence data, species monitoring projects, referenced multimedia objects.
  • Type 3 [submission of this data type is currently not possible] Any type of triple-structured data, referenced multimedia objects.

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes Yes

Data Submission Formats

Data

Flexible; Preferably standardized formats like ABCD, EML; Spreadsheets (CSV)
Example templates for data submission can be found in the GFBio collection of recommended data submission templates

Metadata

EML, ABCD, DarwinCore

Data Accessibility

Public access points

GFBio, BioCASe Data Access Services at the SGN, Institutional landing pages of citable stable URIs,  BioCASE, GBIF, SeSam/AQUiLA

Standardised exchange formats

XML-files in ABCD, DarwinCore, EML standard

Data formats

Text, CSV, XML

Long-term availability

Unlimited (minimum guaranteed time period of 10 years)

Data Publication Services

Data Citation DOI
Yes Yes
via GBIF publication
via DataCite publication

The State Museum of Natural History Stuttgart is one of two State museums in Baden-Württemberg, southern Germany. With its important zoological, paleontological and mineralogical collections and herbarium containing more than 11 million specimens (fossils, minerals, plants, insects, molluscs, and vertebrates) the museum does possess an excellent foundation for biosystematic research. Due to its diverse international scientific contacts and relations, the natural history museum significantly contributes to the identity of the State Baden-Württemberg.

Contact in GFBio context: Dr. Joachim Holstein and Dr. Juan Carlos Monje Contact the SMNS GFBio team

Extended profile SMNS

Service Description

Data archiving for research projects focusing on botanical, zoological and paleontological according to our profile description (SMNS Extended Profile).

  • Type 1a Collection data, together with the deposit of physical objects, referenced multimedia objects.
  • Type 1b Observation and occurrence data, species monitoring projects, referenced multimedia objects.
  • Type 2 Taxon reference list data and checklist data.

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes Yes

Data Submission Formats

Data

(a) Export files from external installations of DiversityCollection, DiversityTaxonNames (b) any spreadsheets (CSV, excel-files), structured according existing DWB import schemes (see SMNS GitHub, SNSB GitHub and ZFMK GitHub) (c) spreadsheets and databases appropriate to create new DWB import schemes); Image formats have to be agreed for submission
Example templates for data submission can be found in the GFBio collection of recommended data submission templates

Metadata

EML, ABCD, DarwinCore

Data Accessibility

Public access points

GFBio, BioCASe Data Access Services at the SNSB, BioCASe, GBIF, BiNHum, DTN Taxon List Services and others

Standardised exchange formats

XML-files in ABCD, DarwinCore, EML standard; Web services of the DWB platform

Data formats

Text, CSV, XML

Long-term availability

Unlimited (minimum guaranteed time period of 15 years)

Data Publication Services

Data Citation DOI
Yes Yes
via GBIF publication

The Staatliche Naturwissenschaftliche Sammlungen Bayerns, München with SNSB IT Center are a research institution for natural history in Bavaria. They encompass five State Collections (zoology, botany, paleontology and geology, mineralogy, anthropology and paleoanatomy), the Botanical Garden Munich-Nymphenburg and eight museums with public exhibitions in Munich, Bamberg, Bayreuth, Eichstätt and Nördlingen. Our research focuses mainly on the past and present bio- and geodiversity and the evolution of animals, fungi and plants. To achieve this we have large zoological, anthropological, paleontological and mineralogical collections and herbarium (almost 35,000,000 specimens) as well as a DNA Bank.

Contact in GFBio context: Dr. Dagmar Triebel and Tanja Weibulat Contact the SNSB GFBio team

Extended profile SNSB

Service Description

Data archiving for research projects is focusing on botanical, mycological, zoological and paleontological data. The data archiving includes management processes with Diversity Workbench (DWB) databases involved. The data publication is done via the DWB network at the SNSB (SNSB Extended Profile).

  • Type 1a Collection data, together with the deposit of physical objects, referenced multimedia objects.
  • Type 1b Observation and occurrence data, species monitoring projects, referenced multimedia objects.
  • Type 2 Taxon reference list data and checklist data.
  • Type 3 Any type of triple-structured data, referenced multimedia objects.
  • Type 4 RAW data (data sets and/or data packages) if well documented and in formats and structures appropriate for long-term archiving, without further data management requirements.

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes Yes

Data Submission Formats

Data

(a) Export files from external installations of DiversityCollection, DiversityTaxonNames and DiversityDescriptions, (b) any spreadsheets (CSV, excel-files), structured according existing DWB import schemes (see SMNS GitHub, SNSB GitHub, ZFMK GitHub or example templates for data submission in the GFBio collection of recommended data submission templates) (c) any spreadsheets and databases in an accessible (not legacy) format appropriate to create new DWB import schemes; Image formats have to be agreed upon for submission

Metadata

EML, ABCD, DarwinCore, DublinCore, SDD

Data Accessibility

Public access points

GFBio, BioCASe Data Access Services at the SNSB, Institutional landing pages of citable stable URIs,  BioCASe, GBIF, BiNHum, DTN Taxon List Services and others

Standardised exchange formats

XML-files in ABCD, DarwinCore, EML, SDD standard; Web services of the DWB platform

Data formats

Text, CSV, XML; Agreed image data formats

Long-term availability

Unlimited, with tape archiving support through LRZ

Data Publication Services

Data Citation DOI
Yes Yes
via GBIF publication
via ZB MED/DataCite publication

The LIB – Leibniz Institute for the Analysis of Biodiversity Change carries out species-related biodiversity research and ensures the transfer of knowledge to researchers and the general public.
Core stocks are the zoological, geological-palaeontological and mineralogical collections of more than 16 million specimens, tissue collections and a DNA Bank. The research focusses on biodiversity and its changes. In order to better understand the current mass extinction of flora and fauna, researchers are looking for connections and causes of – often – man-made changes. The goal is to develop solutions for the preservation of ecosystems and species in order to maintain the basis of current life.
The results of research and the collections are made accessible to the public with permanent and temporary exhibitions and using other methods for public education.

Contact in GFBio context: Dr. Peter Grobe, Birgit Klasen Contact the LIB GFBio team

Extended profile LIB

Service Description

Data archiving for research projects is focusing on terrestrial animals (LIB Extended Profile). The data archiving includes management processes with Diversity Workbench (DWB) databases and Morph∙D∙Base involved.

  • Type 1a Collection data, together with the deposit of physical objects, referenced multimedia objects.
  • Type 1b Observation and occurrence data, species monitoring projects, referenced multimedia objects.
  • Type 2 Taxon reference list data and checklist data.
  • Type 3 Tissue/DNA biobank storage, referenced multimedia objects (e.g. sound data, 3d image stacks/volume data).

Service Levels

Data Set Data Package Data Management Research Objects
Yes Yes Yes Yes

Data Submission Formats

Data

(a) Export files from external installations of DiversityCollection, DiversityTaxonNames

(b) any spreadsheets (CSV, excel-files), structured according existing DWB import schemes (see ZFMK GitHub, SMNS GitHub and SNSB GitHub)

(c) spreadsheets and databases appropriate to create new DWB import schemes);

Image formats have to be agreed for submission.

Example templates for data submission can be found in the GFBio collection of recommended data submission templates

Metadata

EML, ABCD, DarwinCore, GGBN, SDD, DublinCore

Data Accessibility

Public access points

GFBio, BioCASe Data Access Services at the LIB, Institutional landing pages of citable stable URIs, GBIF, DTN Taxon List Services, Morph∙D∙Base and others

Standardised exchange formats

XML-files in ABCD, DarwinCore, EML, SDD standard; Web services of the DWB platform

Data formats

TXT, CSV, XML; Original and derivate image, audio, and sound data formats

Long-term availability

Unlimited (minimum guaranteed time period of 10 years)

Data Publication Services

Data Citation DOI
Yes Yes
via GBIF publication
via ZB MED/DataCite publication