CERN announces new open data policy
The four main LHC collaborations (ALICE, ATLAS, CMS and LHCb) have unanimously endorsed a new open data policy for scientific experiments at the Large Hadron Collider (LHC), which was presented to the CERN Council today. The policy commits to publicly releasing so-called level 3 scientific data, the type required to make scientific studies, collected by the LHC experiments. Data will start to be released approximately five years after collection, and the aim is for the full dataset to be publicly available by the close of the experiment concerned. The policy addresses the growing movement of open science, which aims to make scientific research more reproducible, accessible, and collaborative.
The level 3 data released can contribute to scientific research in particle physics, as well as research in the field of scientific computing, for example to improve reconstruction or analysis methods based on machine learning techniques, an approach that requires rich data sets for training and validation.
The open data policy reflects CERN’s commitment to open science, which was already asserted in the CERN Convention over 60 years ago,” said Eckhard Elsen, CERN Director for Research and Computing. “The policy sets out the concrete steps towards its implementation at CERN, which will make data available to the extended scientific community as well as the general public.
Scientific data are considered to have different levels of complexity. Level 3 data are of the type used as input to most physics studies and will be released alongside the software and documentation needed to use the data. Its release will allow high-quality analysis by diverse groups: non-CERN scientists, scientists in other fields, educational and outreach initiatives, and the general public.
The policy also covers the release of level 1 and level 2 datasets, of which samples are already available. Level 1 corresponds to the supporting information of results published in scientific articles, and level 2 corresponds to dedicated scientific datasets designed for educational and outreach purposes.
In practice, scientific datasets will be released through the CERN Open Data Portal, which already hosts a comprehensive set of data related to the LHC and other experiments. Data will be available using FAIR standards, a set of data guidelines that ensure the data are findable, accessible, interoperable, and re-usable.
“The policy provides a progressive framework for the openness and preservation of experimental data,” said Jamie Boyd, convener of the working group that formulated the policy. This strategy complements CERN’s existing Open Access policy, which mandates that all CERN research results are published in open access. It is also aligned with the recent European Strategy for Particle Physics Update announced in June 2020. The new policy could be used as a blueprint for other experiments at CERN and in other scientific organisations.