The National Research Data Infrastructure (NFDI) - Position paper by the German Federation for Biological Data (GFBio)
The National Research Data Infrastructure (NFDI) - a Cornerstone for Biological and Biodiversity research
Position paper by the German Federation for Biological Data (GFBio)
In 2016, the National Council for Scientific Information Infrastructures (RfII) introduced a far-reaching perspective for the development of a sustainable National Research Data Infrastructure (NFDI) in Germany. In subsequent reports and publications specific aspects and a vision were outlined that meet the requirements of digital science for the next decades. Key recommendations are to organize the existing infrastructural landscape into consortia and to name scientific communities as the key drivers to gear the development of the NFDI.
One of the possible buildings blocks for NFDI is the German Federation for Biological Data (GFBio), which is a consortium of 20 institutions in Germany comprising domain-specific data centers, museums, collections, and research facilities. The DFG funded project aims at establishing a federated infrastructure for biological data and follows a holistic approach encompassing technical, organizational, financial, and cultural aspects. GFBio is running for five years, is fully operational, and currently in a consolidation phase. The consortium has set up a charitable association as legal entity. Services supplied include data submission, long-term archiving, publication, a data portal, and a web based tool for visualization and analysis (VAT) as well as a terminology service (TS). As a unique selling point GFBio enables uniform access to environmental (PANGAEA), sequence (EMBL-EBI /SILVA), biodiversity and collection data (e.g. processed and manually curated by systems like DWB and BEXIS 2).
Federation and Fragmentation
Federated data infrastructures like GFBio are mostly building on existing structures and developments of various institutions. Naturally, they substantially conserve work that has been invested in the past and benefit from the expertise, innovations, and resources of consortium members. Moreover, each of the different partners is widely interlinked with international activities and developments. However, a substantial amount of time and effort has to be invested into the effective organization. Workflows, standards, interfaces, and resources have to be aligned and coordinated. Time is needed to cope with the fragmented landscape and to agree on and to develop the necessary commonalities - more time than is usually given by traditional project-based funding regimes. Therefore, the concept of NFDI as a long-term measure is most appreciated by GFBio.
Semi-Automated Workflows for high-quality Data
RfII clearly emphasizes the need for quality data and services. In particular, for the emerging landscape of cloud based data service platforms such as GBIF, DataONE, EOSC, or GEODAB easy to use, integrated, and reliable high-quality data are needed. This requires certified services and harmonization of data structures and semantics. Along the same lines the FAIR Data Publishing group emphasizes machine readability of data as one of the major challenges. This can only in part be achieved through sophisticated systems and automation. Predominantly, manual curation by domain experts is needed to meet the special requirements in the different research fields. This is a long-term investment that is currently not funded. Very positive in this respect is RfII’s recommendation for a bold investment in human resources.
Integrating with Research Practice
In fact, data management should be seen as an integral part of research and research funding. However, in the past there was almost a polarization between science and the development of data infrastructures. To improve the context with science we need top down (e.g. policies) and bottom up (incentives) measures and developments on different levels including means to leverage a cultural change. Again, this needs significantly more time than the technical implementation of data services. Still, scientists have insufficient awareness of quality data infrastructures, instead, very often making use of simple not FAIR repository services. The insufficient awareness is also due to the fact that qualified personnel with expertise in data science is sparse. GFBio - like state-of-the-art research in general - needs these ‘hybrids’ linking the two worlds - science and IT. Data scientists not only have the capabilities to make the best use of supplied services but also push the development of research data infrastructures. The situation requires changes in curricula, which is out of scope for GFBio and other similar projects. Consequently, RfII recommends this to be covered by the NFDI.
The most urgent problem for project funded federated infrastructures like GFBio is sustainability. Permanent resources are not only needed for the operation of the common services, in particular curation, but also for the further evolution of the infrastructure. Regarding the fast developments in IT continuous adaptations are required. GFBio favours a mixed ‘business’ model composed of a fixed and demand oriented funding part, a model which has been shown to be successful in open source software development. Fixed funding is currently assumed to be compensated by in kind commitments of participating institutions (essentially basic services). The demand oriented part, that is data management as a funded part in research, is seen to be crucial for the data services to adapt to science needs. However, although funding policies are in place and supported by review boards as well as commissions, the implementation is still at its early stage. In this respect, there is an urgent need to develop reliable new funding models that do not impose practical hurdles.
Should GFBio fail, this will not be due to a lack of quality of services, but will rather be due to a lack of adequate long-term resources. The sustainability problem is addressed by GFBio but is unlikely to be finally solved within the remaining funding period. Here, we clearly see the responsibility and task of the NFDI. GFBio together with its partners and its community network is committed to strongly contributing to the success of NFDI.
Michael Diepenbroek, Coordinator GFBio - firstname.lastname@example.org
Board of GFBio e.V. - email@example.com
2 Rat für Informationsinfrastrukturen (RfII) Recommendations 2016: Performance through Diversity, http://www.rfii.de/?wpdmdl=2075
4 eingetragener Verein, https://www.gfbio.org/gfbio_ev
16 The FAIR Guiding Principles for scientific data management and stewardship. – Sci. Data 3:160018, https://doi.org/10.1038/sdata.2016.18
17 compare: Royal Society (2012) Science as an open enterprise: open data for open science, https://royalsociety.org/~/media/policy/projects/sape/2012-06-20-saoe.pdf
18 e.g. DFG (2015) Leitlinien zum Umgang mit Forschungsdaten in der Biodiversitätsforschung, http://www.dfg.de/download/pdf/foerderung/antragstellung/forschungsdaten/richtlinien_forschungsdaten_biodiversitaetsforschung.pdf
25 October 2018
de.NBI/GFBio Summer School on Data Management
The German Federation for Biological Data (GFBio) and the German Network for Bioinformatics Infrastructure (de.NBI) promoted for the first time a joint summer school to support researchers in managing their scientific data. 18 young scientists met at the Braunschweig Centre for Systems Biology from 03 to 07 September 2018 to attend Riding the Data Life Cycle, a title that refers to the OECD recommendation that data from publicly funded research must be publicly available.
The aim of the summer school was to provide a practical toolbox for the collection, maintenance, documentation, archiving and publication of research data according to the FAIR data principles (Findable, Accessible, Interoperable and Re-usable). Practical training was also integrated to evaluate data for reuse.
The GFBio Data Management Plan Tool is OUT!
The new GFBio Data Management Plan Tool (DMPT) is available from July 19, 2018. By adding this tool to the services' portfolio, GFBio can directly help you to prepare your personal data management plan.
After submitting the DMPT questionnaire, you immediately connect with our data management experts. They are able to identify suitable data archives for your data, give you advice concerning data management policies and data access rules, estimate data management costs and efforts, and provide a roadmap for the long-term availability of your data. Furthermore, GFBio offers reviewing of your DMP regarding any special requirements of your funding agency.
By writing a Data Management Plan (DMP) you save a lot of time in the long run; to begin with, you create awareness of potential data management obstacles. In fact, a well-structured DMP clarifies how and what data will be created, processed and documented. It also refers means of data archiving and publication regarding costs and access conditions. In short, the DMP helps you asking the right questions concerning data management, and is likely to enhance the transparency and the integrity of your work. In addition, DMPs are increasingly required as mandatory proposal part by research institutions and funding agencies.
We have developed the DMPT for individual scientists (during all stages of their career), research groups, and for large research projects – you can use the tool to send your DMP to our experts for support and revision; save your personal DMP and come back to it later; or create a pdf version of your work to print out. You choose!
Do have a look at our training materials if you want to learn more about data management planning and don't hesitate to contact us at info[at]gfbio.org in case of any question.
We are looking forward to helping you with your DMP!
GFBio Retreat in Erfurt - we are ready for the third phase!
This year’s GFBio Retreat joined the GFBio Steering Committee and the Strategic Advisory Board in a collaborative forum to present and discuss ideas on specific needs, gaps, and opportunities to establish GFBio as a permanent infrastructure network.
Particular focus was set on how to foster a successful participation in the National Data Infrastructure (NFDI) and optimize the existing service lines. Discussion on how to advance a strategy and high-level tactics to increase the visibility of GFBio within the scientific community, culminated with establishing a list of practical actions to achieve that goal.
The retreat revealed highly beneficial for an open assessment of internal capabilities and performance, and laid a roadmap to effectively move GFBio through the third phase (starting in August 2018) and beyond.
Let the third phase begin!
Press Release: DFG funds the national contact point for scientific data management through 2021
Data provide the foundation for empirical science. Datasets are also becoming more expansive and complex in the field of environmental sciences. At the same time, however, they are opening up new possibilities, for example, when older datasets are applied in combination with new analytical tools. The German Federation for Biological Data (GFBio) project, funded by the German Research Foundation (DFG – Deutsche Forschungsgemeinschaft), and coordinated at MARUM – Center for Marine Environmental Sciences at the University of Bremen, will improve the management of research data and enhance the exchange of data among researchers. Starting in August, the DFG will fund the third phase of the project with around 4.3 million Euros.
“GFBio helps to solve a central problem of current research: to make research data accessible over the long term and to enable better science,” says Dr. Michael Diepenbroek, manager of the PANGAEA data center where the GFBio project is coordinated. “We bring together collection, genome, and environmental data. This represents a great potential for science.” Through the search function in the portal almost 5.5 million data entries are already available to scientists for use in their research from the eight affiliated GFBio data centers.
Data from publicly financed research should be widely and freely accessible, both the raw data produced during the research process itself, as well as the metadata, which describe the conditions and procedures that were used to obtain the data. This is where the GFBio project comes in. 19 partners from throughout Germany participate in the project, including universities, museums, and molecular-biological archives. The project participants are committed to the principles of "FAIR Data". In this context, FAIR stands for “Findable, Accessible, Interoperable, and Re-usable”.
GFBio serves as a national contact, information and consulting center for all questions concerning the standardization and management of biological research data throughout the entire life cycle of the data, i.e., from data collection and archiving to publication. During the conception and implementation phase of the project, the data portal www.gfbio.org was created. The third phase, which begins in the summer, will be primarily dedicated to improving the services offered in cooperation with the scientists, and to building a sustainable research data infrastructure. To furnish this process with a legal structure, the non-profit organization GFBio e.V. was also established.
The total funding amount for the coming three years is 4,370,000 Euros. In MARUM at the University of Bremen the entire project has been coordinated since the beginning of the first funding phase in 2013. This means that the data are published and that environmental data are better connected to molecular data.
Dr. Michael Diepenbroek
Telephone: 0421 218 65590
MARUM Press and Public Relations
Telephone:0421 218 65540
The “German Federation for Biological Data” (GFBio) project has been funded by the German Research Foundation since 2013 and it presently has 19 scientific partners. As an infrastructure, GFBio addresses the research data management requirements in the areas of biological and environmental data from research institutes, individual researchers, German natural history museums, developers of scientific software, and from large research projects, groups and networks. The services offered by GFBio cover the complete life cycles of data from the collection of raw data to the publication of scientific articles. Additionally, researchers are supported in the preparation of data-management plans, which have become an obligatory part of research proposals in many fields. Here, GFBio acts as a central contact point for the long-term archiving of biological and environmental data. Datasets submitted to the GFBio are deposited in one or more of the eight data centers and linked with one another through so-called “persistent identifiers” (for example, DOIs – “digital object identifier”). In order to ensure the necessary performance of its tasks beyond the project period, the registered non-profit organization GFBio e.V. was established in 2016 as a legal entity and central point of contact.
PANGAEA is operated by the Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung (AWI) and MARUM – Center for Marine and Environmental Sciences at the University of Bremen. PANGAEA is open for any project, institution or individual researcher who uses, archives or publishes data. The PANGAEA information system is an open-access library that archives, publishes and distributes environmental and biodiversity data from Earth-system research. Most of the data, depending on licensing and moratorium restrictions within certain research fields, are openly and freely available.
Using state-of-the-art methods and through participation in international projects, MARUM investigates the role of the ocean in the Earth’s system, particularly with respect to global change. It quantifies the interactions between geological and biological processes in the ocean and contributes to the sustainable use of the oceans. MARUM comprises the DFG Research Centre and the Excellence Cluster “The Oceans in the Earth System”.
New title under the auspices of GFBio
“A generic workflow for effective sampling of environmental vouchers with UUID assignment and image processing” is the latest article published with the support of the GFBio project. The authors present a workflow for sampling biological and environmental vouchers in the field, while simultaneously generating universally identifiable data. GFBio importance is highlighted as counsellor and facilitator towards data management best practices.
Enjoy your reading!
After riding the wave, it is time to navigate through the data jungle
GFBio has contributed to the recently published Knowledge Exchange (KE) report about the evolving landscape of Federated Research Data Infrastructures (FRDI). This document draws from the work by the Knowledge Exchange Research Data expert group which recognized the need of better understanding the dynamics/consequences of ‘federating’ research data infrastructures. It also follows the KE report from 2011 elaborated in response to the European Commission ‘Riding the wave’ call for the development of an international framework towards a collaborative data infrastructure.
The report is based on a collection of interviews with experts leading or managing sixteen FRDI's in all six KE partner countries i.e. Denmark, Finland, France, Germany, the Netherlands and the UK (table 1).
The analysis of such interviews provided nine main conclusions relatively to a variety of factors: definition, characteristics, and drivers for the emergency of FRDIs, funding situation, users role, complexity of research environments, challenges and impacts, and the importance of the European Open Science Cloud (EOSC).
This study, authored by Stéphane Goldstein, is available since November 2017. It surely adds momentum to the transition towards data driven research and is a good starting point for anyone interested in understanding the present motivation in pushing forward a federated data infrastructure and implications thereof.