Establishing a data management plan

Standard formats in physics

The use of open formats is one way to make scientific data ‘Findable, Accessible, Interoperable and Reusable’.

Below we have provided a non-exhaustive list of open standard formats used in the field of physics.

HDF5Hierarchical Data Format (extension .hdf5) is a format designed for extremely large, complex and heterogeneous data, notably used in physics, astronomy, earth sciences, engineering, genomics and computational fluid dynamics. Originally developed at the National Center for Supercomputing Applications, it is supported by HDF, a non-profit corporation whose mission is to ensure the development of HDF5 technologies and the accessibility of data stored in HDF. HDF5 uses a “file directory” like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer. The HDF5 format also allows for embedding of metadata making it self-describing. Many common software tools used for academic research support HDF5, including Labview, MATLAB, Mathematica, ParaView et Matplotlib. HDF5 also supports many programming languages such as Fortran, Python, R, .NET, Julia and many others. See the documentation for more information, as well as a video on using HDF5 files via HDFView, which is a visual tool for exploring and editing HDF4 and HDF5 files.
NexusNeXus (extensions .nxs and .nex) is a common data format for neutron, x-ray and muon science. It is being developed as an international standard by scientists and programmers representing major scientific facilities in order to facilitate greater cooperation in the analysis and visualization of neutron, x-ray, and muon data. NeXus is built on top of the scientific data format HDF5 and adds domain-specific rules for organizing data within HDF5 files in addition to a dictionary of well-defined domain-specific field names. An Application Program Interface (API) has been developed to facilitate the reading and writing of NeXus files. Here is the link to the documentation for more information.
JCAMPDXJCAMP-DX is a format for opening and displaying data related to Raman, UV, NMR, EPR and mass spectra, chromatograms, thermograms, etc. Spectra in JCAMP-DX format can be read with open source software such as JDXview and JSpecView. Here is the link for further information.
NetCDFNetwork Common Data Form (extensions .nc et .cdf) is a format widely used in oceanography and meteorology to store variables, such as temperature, pressure, wind speed and wave height. This format allows the creation, access and sharing of scientific data stored in tabular form. The Unidata Program Center supports and maintains the netCDF programming interfaces for C, C++, Java, and Fortran. Programming interfaces are also available for Python, IDL, MATLAB, R, Ruby and Perl. Softwares that can be used to manipulate or display netCDF data are listed here. HDF Explorer is a data visualisation program that reads Hierarchical Data Format files as well as netCDF data files.
FITSFlexible Image Transport System (extensions .fits, .fts, .fit) was developed by the IAU FITS working group. It is a standard data format in astronomy approved by NASA and by International Astronomical Union. In addition to detailed ASCII metadata, it can contain a series of images acquired from different sources and in different spectral ranges, as well as multidimensional data tables. Support for FITS files, via standard libraries, is available for most languages used in a scientific context, such as C, Fortran, Java, Perl, Python and IDL.
ASDF AstronomyAdvanced Scientific Data Format (extension .asdf). This is a data format used in particular for astronomical images. An extension of FTIS, it has been developed to eliminate most of the current problems with the FITS format. The metadata are contained in a YAML header (Yet Another Markup Language) followed by binary or ASCII data. Here is the link for further information. Pyasdf is an interface in Python that allows to create and modify the ASDF file.
PDSThe Planetary Data System file format is a standard format devised by the Planetary branch of the National Aeronautics and Space Administration (NASA) for storing solar, lunar, and planetary data collected on Earth and by interplanetary spacecraft.
Extensions : .img, .imq, .lbl, .pds.
SEGYThe SEG-Y file format (extensions .segy et .sgy) is one of several standards developed by the Society of Exploration Geophysicists (SEG) for the storage of seismic data.
ASDF SeismologyAdaptable Seismic Data Format, based on HDF5, for storing an unlimited number of synthetic, processed or unmodified waveforms in a single file. It includes information about events or stations in the same file. Full provenance information can be stored with each data item. Here is the link for further information.
GRIBGRIB or GRIdded Binary or General Regularly-distributed Information in binary form (extensions .grib, .grb, .gb) is a data format used to store historical and forecasted weather data. The GRIB standard was designed and is maintained by the Commissions for Basic Systems of the World Meteorological Organization.
EDFThe European data format (EDF) is a simple and flexible format for exchange and storage of multichannel biological and physical signals. An extension of EDF, named EDF+, was developed in 2002 and is largely compatible to EDF. EDF+ gives the possibility to also add annotations and analysis results.