Producing FAIR data

Data FAIRification

The FAIR principles (Findable, Accessible, Interoperable and Reusable) are a set of good practices which, when shared, promote transparency, reusability and reproducibility of research results, in line with the principles of open science.

The interactive infographics by DoraNum describes each principle and illustrates it in the context of the research data.

These principles therefore apply not only to data, but also to all research outputs, instruments and infrastructure. As they address key scientific issues (such as reproducibility and re-use), they have been adopted by many stakeholders within the research ecosystem (including policymakers, funding agencies and publishers) and applied across various fields of chemistry and physics.

A wide range of tools and initiatives have been developed to promote FAIR data: here is a selection!

General issues

Scientific issues

FAIR principles promote transparency, reproducibility and reusability not only of data but also of code, workflows (and hardware), and help to overcome barriers to the discovery and automated processing of data by machines, against a backdrop of growing volumes and increasing complexity of the data being processed. They enable the practical implementation of open science.

Broadly speaking, applying them in the context of a research project may be more effective.

Economic issues

A study carried out in 2018 by the European Commission « Cost-Benefit analysis for FAIR research data – Cost of not having FAIR research data”, identifies the costs associated with data not being FAIR, in terms of wasted time, cost of storing redundant data, and etc. These costs are estimated to amount to 3% of Europe’s research budget (but 78% of the H2020 programme budget), or €10.2 billion per year (in 2016)

The implementation of the FAIR principles varies at institutional level and within scientific communities:

  • According to the EOSC Observatory 2022 data, 34% of European Countries are said to have a national policy on FAIR data.
  • In 2024, the IOP publisher published a study on data sharing based on a collection of over 30000 articles in the field of physics ; one out of ten researcher actually share FAIR data. The breakdown by subject area showed that, with the exception of environmental sciences, FAIR data sharing is not widespread.

To find out more about the issues on FAIR principles: watch the presentation given at the NFDI Physical Science Joint Colloquium “Fair Data: no longer optional, but it takes a village !” by Dr Sansone, associate director of the Oxford e-Research Centre.

FAIR principles and the research ecosystem: which requirements ?

Funders

Some funding bodies place the FAIR principles at the heart of their open science support schemes. The ANR describes Data Management Plan in its FAQ as «…a tool to help facilitate discussion around research data with a view to making it FAIR (Findable, Accessible, Interoperable, Reusable).».

In line with the report Turning FAIR into reality (2018), Horizon Europe recommends that data be shared as openly as possible and as restrictively as necessary, and requires funded projects to have a Data Management Plan for FAIR data.

Publishers and scholarly societies

Some publishers may formally endorse the FAIR principles and encourage authors to follow them, recognising the importance of data in the research process (see, the blog post “Publishers and FAIR Data”).

This is the case for most generic publishers (Wiley, Taylor et Francis )

As for publishers in the field of chemistry, ACS recommends the implementation of the FAIR principles on ethical grounds.

As for publishers in the field of physics, IOP encourages authors to share their research data using open formats and standards used by their community.

Physical Review journals share the same policy.

The editors of the Journal of Cheminformatics have been asked to consider the need to make the use of FAIR Chemical Structures mandatory. In their response, The editors highlight the importance of the FAIR principles, but also the practical difficulties involved in making them mandatory. However, certain types of publications in this journal, known as Data Notes, focus on making data FAIR. They include a ‘Data Description’ section and two sub-sections: ‘Curation’ and ‘FAIR-ification’. Here is a an example of such a publication.

The American Geophysical Union undelines in its Position Statement on Data the importance of open access to earth science data in a variety of formats and the long-term preservation of data. AGU journals adhere to the FAIR data principles.

Implementing the FAIR Principles

As described in 2016, the FAIR principles apply not only to data but also to other research outputs (such as code, hardware and infrastructure). See, for example, Cousijn, H. (2022). FAIR is everywhere.

Here we highlight a few areas where the FAIR principles are applied in physics and chemistry.

Data

Focus on chemistry

“Chemistry data should be FAIR, proponents say. But getting there will be a long road” as highlighted in an editoral of ACS Chemical and Engineering News. Here we take a look at some of the networks, projects and initiatives launched to support the FAIR principles in chemistry.

The Go FAIR Chemistry Implementation Network (ChIN) is a consortium bringing together the chemistry community to promote standards, software and practices that align with the FAIR principles, in the spirit of the GO FAIR project. This article by Coles and al (2020) présents the achievements of the network.

along with CODATA and the RDA, IUPAC (International Union of Pure and Applied Chemistry) contributes to the WorldFair Project applkied to chemistry. In particular, it has led to the development of a compendium of best practices and a cookbook (IUPAC Fair Chemistry Cookbook) offering self-study materials and tutorials. The project deliverables are also available online (WorldFAIR Project (D3.1) Digital recommendations for Chemistry FAIR data policy and practice). As part of the FAIRSpec project, launched in 2019, IUPAC has drawn up a list of principles and recommendations to promote the FAIR principles in spectroscopic data (Archibald et al., 2025). The aim is to identify information and metadata that must be included regardless of the data format used by equipment suppliers. FAIR data management is an ongoing process, in which the context of data collection and curation are essential. Standards should be designed to be “modular, extensible and flexible”. In addition to these general principles, the project is experimenting a dataset processing workflow to produce a documented “IUPAC FAIRData Collection”.

Established in 2020, the German NFDI(4Chem) consortium continues its project : fournir à la communauté scientifique une infrastructure permettant la collecte, la gestion et la diffusion de données FAIR. The Chemistry Consortium’s Action Plan for 2025–2030 sets out its objectives and action plan: “Our Vision: All Chemists Publish FAIR Data”. Every year, an award is presented to recognise the most FAIR dataset (FAIR4Chem). Prof. Dr. Daumann, who won this award, provides her testimonial.

In the field of catalysis, several initiatives have been launched as part of the NOMAD project and the activities of the UK Catalysis Hub (UKCH), one of which is highlighted in Catalysis Communications focuses on the creation of a portal that indexes data in the pipeline. The portal will display a score to assess whether the dataset is FAIR or not.

Focus Physique

Institutions, such as CERN, explicitly refer to the FAIR principles in their open science policy. See also their Open Science Portal.

In august 2025, ICFA (International Committee for Future Accelerators /Comité international pour les accélérateurs futurs) approuved et promoted the recommendations formulated by the « Data lifecycle » experts group. These Recommendations for Best Practices for Data Preservation and Open Science in HEP are also available on the IFCA site in an interactive and visual version.

NASA, in an article entitled Beyond Fair: Engagement, Data Usability, and Open Community Productivity through the NASA Open Science Data Repository (2024), also highlights the importance of ensuring the fairness of the data in its warehouse.

The European Open Science Cloud (EOSC) Photon and Neutron Data Service (ExPaNDS) published recommendations on data FAIRness in 2022: see Soler et al. (2022).

In its data management policy, the European Synchrotron Radiation Facility (ESRF) refers to the PaNOSC FAIR Research Data Policy to make ESR data FAIR since 2024. See : Favre-Nicolin, V. et al. (2024).

In the field of condensed material, the FAIRmat consortium of the National Reasearch Data Infrastructure (NFDI) provides the community with infrastructure, tools, standards and support to make research data FAIR.

Focus on environmental sciences

PANGAEA, an environmental science repository certified as a trusted repository, achieved the highest score in terms of FAIR criteria during the evaluation by the European Commission. A wiki enables researchers to adopt best practices for data description and dissemination. An article (Felder et al., 2023) published in Scientific Data provides a comprehensive overview of the organisation of PANGAEA.

The EPOS portal, the European Plate Observing System, and in particular Epos-France, which provides access to the inventory of data and metadata from the seismological data centre, also apply the FAIR recommendations in accordance with the international standards established by the Federation of Digital Seismograph Networks (FDSN), which makes them easier to use.

The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) has adopted the FAIR Principles. In an article published in Nature, the implementation of FAIR principles by the Atlas repository is explained.

Codes and research softwares

The first applications of the FAIR principles to research software were published in 2022 in an article in Scientific Data [Introducing the FAIR Principles for research software] following the work carried out by the FAIR working group on research software (FAIR4RS), which has been joined by the Research Software Alliance (ReSA), Future Of Research Communications and E-Scholarship (FORCE11) and the Research Data Alliance (RDA). They highlight the specific characteristics of software: its use (it is executed), its reusability (it can be modified, built or incorporated into other software), and the importance of version management and its development environment.

When applied to software, the FAIR principles encompass, in particular, the interaction between software systems through the exchange of data and/or metadata, and/or via application programming interfaces (APIs) defined by standards.

The FAIR principles for Research SOftware FAIR4RS 2 are described in this document and this article de Wilhelm Hasselbring et al. (2020).

The SoFAIR project aims to facilitate the discovery of and access to the software that researchers use, develop or share by linking publications to software, thereby addressing two of the FAIR principles. This international collaborative initiative therefore aims to improve the reusability of open research software. The project draws on widely used sources (CORE, Software Heritage, HAL) and tools (GROBID) to identify references to software in publications and facilitate their discoverability.

But are the FAIR principles really appropriate in the context of code and software? Di Cosmo et al., (2025) calls this into question: “And so software comes with a double bind: like data, it supports the findings of a study and should be preserved and published. Yet it should also remain available and supported, and possibly be improved, over time. (…) Some programs have a weekly or even daily release cycle, making the FAIR approach impractical”. This article, Code beyond FAIR, examines the specific characteristics of software in relation to the FAIR principles.

Workflows

Hardware, instruments and infrastructures are key components of the research ecosystem that contribute to the implementation of a FAIR approach.

In 2011, CERN launched its (Open Hardware) initiative including the OHL open licence, the first of its kind for hardware. It is based on the following principle: anyone should be able to access the design documentation and technical drawings, enabling them to understand, study and reproduce a piece of equipment. The White Rabbit technology designed for data acquisition and control in particle physics is one example of how these principles, developed at CERN, are put into practice.

CERN hosted the first meeting of the Global Open Science Hardware (GOSH) initiative in 2016, which led to the drafting of a manifesto, emphasising the importance of equipment in an experimental scientific approach and in open science (see the manual).

Established in 2022, the RDA working group, the FAIR4RH Interest Group, is dedicated to applying the FAIR principles to materials, defining scientific hardware. The implementation of the FAIR principles is currently underway, following work carried out by Nadica Miljković et al, which identify, based on use cases for open licences, trusted repositories (such as Open Hardware Repository), metadata formats and identifiers used by open hardware.

Research infrastructures and technical platforms also play a vital role in defining procedures, promoting a FAIR culture, and, of course, generating and managing FAIR data, as highlighted by Murphy et al (2025). The ExPanNDS projet, for instance, aims to establish a FAIR data management policy for particle physics infrastructures. More information here Another initiative launched in the United States is that of the National Center for Atmospheric Research. A review of the FAIRisation of instruments has highlighted the need to develop persistent identifiers for instruments, such as Research Resource Identifiers (RRIDs) or DOIs. This report emphasises the importance of uniquely identifying equipment, but notes the difficulties involved in implementing and adopting such a system (see also the Report on Recent Progress, remaining Challenges and Emerging PIDs Stratégies ).

FAIR Principles for IA ?

In 2022, Argonne National Laboratory organised a workshop entitled “FAIR for Artificial Intelligence”. Several FAIR initiatives involving AI were presented on this occasion: FAIR4HEP (in High energy Physics), ENDURABLE, HPC-FAIR (a framework to manage data and models used in various scientific contexts), BioDataCatalyst. An article, entitled “FAIR for AI: An interdisciplinary and international community building perspective” published in Scientific Data summarises this feedback.

FAIR AI relies on FAIR training datasets and FAIR models (models that include an identifier, metadata and instructions for running the model, as well as metrics for assessing the consistency of predictions and the model’s performance). This also involves depositing them in appropriate repositories : ML Commons, Data and Learning Hub for Science, DLHub). At the same time, the work of the OSI (Open Source Initiative) has helped to define criteria suited to the context of ‘open-source’ AI, which is reusable, modifiable, shareable and open to scrutiny. Three groups of criteria provide detailed descriptions of the training data and disclose all the codes and parameters used (weighting and model configuration). A Google research team proposes the use of Data cards to document data in the context of AI.

In an article (FAIR AI Models in High Energy Physics) published on ArXiv, researchers propose a practical definition of the FAIR principles in the context of data and artificial intelligence models for experimental research in high-energy physics.

  • Findable : download the artificial intelligence model from GitHub, GitLab or BitBucket ;
  • Accessible : to have a standard, open, free protocol for retrieving a model using an identifier;
  • Interoperable : the metadata describing the artificial intelligence model must comprehensively document all aspects of its structure, training and inputs;
  • Reusable : specify the software, tools and dependencies required to seamlessly invoke the artificial intelligence model.

An article published in Scientific Data presents an example applied to high energy physics. FAIR principles for AI models.

Fig1, in Ravi, N., Chaturvedi, P., Huerta, E.A. et al. FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy. Sci Data 9, 657 (2022). https://doi.org/10.1038/s41597-022-01712-9

Resources for data FAIRification

Check-lists and practical guides

  • The “minimum à FAIR” [FR] outlines key considerations and practical advice at each stage of the data lifecycle to ensure data is FAIR.
  • UtrechtUniversity FAIR-Cheat Sheets : Practical guidelines for data and codes.

General assessment tools

There are several tools available to assess the extent to which research data complies with the FAIR principles:

With the development of generative AI, tools have been developed to help researchers make their data FAIR. These tools are often still at the experimental stage (such as, this tool called FAIR). These tools should be approached with caution: is there a risk that data relating to your project could be disclosed or reused to train the model? Is the information generated accurate or fabricated? What is the environmental cost of using this tool?

Websites

There are many websites that provide further information on the FAIR principles and offer training and awareness-raising resources.

  • How to FAIR ? a website published by Danish universities with the support of the Research Data Alliance. E-learning pages and modules covering topics such as file formats, metadata, licences, credentials, data access, etc. Numerous examples of applications in a research context and feedback from researchers provide a practical understanding of the FAIR principles. Please note: the site’s content (texts, interview transcripts, images) is available on Zenodo. They are also reusable.
  • FAIR sharing. A comprehensive, multidisciplinary website. It lists standards and repositories that comply with the FAIR principles, as well as the requirements and recommendations of funding bodies. For example, FAIRsharing.org: ThermoML; ThermoML, DOI:10.25504/FAIRsharing.7b0fc3.

Games

  • Faut pas s’en FAIR [FR]: an extension of the cooperative game GopenDoRe designed to familiarise players with the FAIR principles
  • Principes FAIR [FR]: A cooperative game to help you understand how to make your data as FAIR as possible (in the field of ecology)