Archiving: a sensitive question
Archiving: a sensitive question
When a project finishes, a decision must be made about what happens to the data produced. Can some of it be deleted? Or does it have intrinsic value to justify a long-term – and costly – solution?
There are several solutions archiving data and preserving its integrity over time.
The major player for scientific archiving, based in Montpellier, the CINES (Centre Informatique National de l’Enseignement Supérieur) ensures data longevity. The first step consists of sending a letter of intent to the director of CINESL featuring a presentation of the project, the type of data, the formats used, and the volume of the data set. If a data management plan has been prepared, all this information is available. A project team is assigned to archive the data which usually takes between 6 months and 1 year. Costs vary depending on the type of service (number of copies on disk or magnetic tape) and the volume of data (3).
For smaller projects (less than 10 TB), the basic tariff is €1,043 (incl. VAT) per TB archived per year. The service includes a local copy on disk, a local copy on tape, and data replication within 300 km.
For bigger projects (greater than 100 TB), the cost is €221 (incl. VAT) per TB archived. In this case, the service includes two local copies on tape and remote data replication.
A fixed processing fee of €2,500 (incl VAT) is payable in advance for the preparation of the service.
Access to the FACILE platform, an on-line format validation tool is also possible via the CINES site. The platform features a list of the eligible formats for deposits and a contact function if expert assistance is required.
The previously mentioned EUDAT.eu platform has a “long-term data preservation system” with the service B2Share. However, unlike CINES, B2SHARE does not commit to long-term content readability.
The service is free-of-charge to all European researchers whether or not they are affiliated with research organizations or universities. Data sets have a long-lasting identifier distributed by the platform. Certain basic metadata must be entered, such as the title and description of the data. Of course, more metadata can be entered, particularly using extensions and interfaces specific to certain communities.
As the name share suggests, data can be published and shared amongst communities. But users control access to their data and can restrict access if they prefer.
To improve data searches in B2SHARE, EUDAT has also incorporated an annotation service: B2NOTE. These annotations are used to classify groups of data or files. Three types of annotation are possible. Firstly, the semantics tag from existing ontologies (currently from Bioportal only (4), with ontologies for biology). Secondly, it is possible to create and associate your own keywords when there is no tag. Thirdly, it is possible to leave comments describing the resource more thoroughly.
While this tool is not indispensable, it can improve indexing of your own data or enable more refined searches in B2SHARE data.
- “The storage of data on laptops, external hard disks or storage devices such as USB sticks, is not recommended.” See the ANR’s PGD model: https://anr.fr/fileadmin/documents/2019/ANR-modele-PGD.pdf
- “Zenodo makes no promises of usability and understandability of deposited objects over time.” https://about.zenodo.org/policies/
- How to archive at CINES:https://www.cines.fr/archivage/comment-archiver-au-cines/
- https://bioportal.bioontology.org/