Data management: new tools, new skills
“It may seem boring, but it is essential.” (1)
That’s how a slightly sardonic editorial in Nature summarized the challenge for researchers requested to produce a data management plan, which is now considered as a required deliverable for national and European research funding bodies.
A few figures illustrating the challenge:
- According to a study published in 2015, only 8% of physics articles and 5% of chemistry articles publish data associated with their publications. (2)
- 50% of experiments are considered to be non-reproducible. (3)
- 80% of the data produced in the last 20 years may well be lost. (4)
- 93% of higher education institutions have no procedure for research data management plans. (5)
- 90% of researchers questioned in a European survey (6) say they individually store, archive or transmit their data.
- 33 % of the same researchers have never heard of data management plans or consider they do not need them. (7)
- Over 80 % of the data produced are stored elsewhere than in repositories. (8)
Since 2019, laboratories supported by ANR must provide a data management plan within six months after project start date. For projects lasting over 30 months, the document must be updated half-way through the project, before submitting the final version. A different DMP model can be used for international partnerships.
At this stage, management plans are not subject to an assessment likely to determine the grant allocation process.
The document, which comprises six sections, aims to anticipate each step in project management and in the data life-cycle: collection, documentation, storage, security, sharing, preservation and cost.
Several tools are made available to help you in your approach.
- A guide to preparing data management plans, DMPOpidor (established by l’Institut national de l’information scientifique et technique, the French documentation center). When creating an account, you will be guided, step-by-step to draft your DMP, register the version, share and submit it for comments by your partners. Access to the DMP is restricted by default, but the settings can be changed to make it public.
- Some (fictitious) DMP models put on-line after national open science events.
- A comparison of repositories which can receive your data.
- A panoramic view of metadata standards applicable to Physics and Chemistry.
- A tool for estimating the costs. This one was developed by TU Delft, the largest public university in the Netherlands. The final criteria is the FTE required depending on the volume of data generated (less than or greater than 5 TB), whether or not the processed data is confidential, the number of partners and possible personal privacy issues. This other tool, developped by the EPFL located in Switzerland, can also display structural costs (servers, electronic laboratory notebooks, repositories…)
More global estimates tend to assign an average of 5% of the total budget of the project to cover the expenses related to data management.
- An assessment tool for the conformity of your management plan with FAIR principles (Findable, Accessible, Interoperable, Reusable) governing the data, developed by the ARDC (which reports to the Australian national research body).
- Helpful tools for selecting the distribution license for your data sets. You will find this selection tool for licences available on Github. You can also refer to the choosealicense platform.
- A listing of best practices for making data sets available on-line on Figshare. The points discussed are rather exhaustive and can be consulted even if you do not intend to deposit data in the repository.
- If your project includes personal information, it is important to respect the General Data Protection Regulation (GDPR), with an impact assessment on data protection.
For this purpose, the CNIL makes the open-source PIA tool freely available for download.
Moreover, contacts are listed on the Openaccess Couperin site as well as in the SOS DMP section, listing the French university departments available to assist researchers with preparation of their data management plan.
A few key dates for DMPs
1966: outlines of data management plans emerge in aeronautics.
1973: NASA publishes a technical report which resembles a DMP.
2006: the Medical Research Council (United Kingdom) requires the implementation of DMPs as a condition of funding.
2007: the Wellcome trust (United Kingdom), today a member of the S Plan, requires DMPs as a condition of funding.
2007: the OCDE publishes guidelines, calling upon the scientific communities to document and archive research data.
2011: the National Science Foundation (United States) requires DMPs as a condition of funding.
2014: the EU requires DMPs for H2020 projects as a condition of funding.
2019: the ANR requires DMPs as a condition of funding.
Chronology inspired from: Smale, Nicholas, et al « The History, Advocacy and Efficacy of Data Management Plans ». BioRxiv, octobre 2018. www.biorxiv.org, doi: 10.1101/443499.