Reproducibility in Science

Reproducibility in Science

A main focus for IDIA is the development of tools to do science in data-intensive fields. A major challenge to this end, and indeed in all of modern science, is reproducibility. Ideally, the scientific community needs to be able to independently reproduce scientific results to trust them and incorporate them in to the global body of scientific knowledge. Over the past twenty years a growing portion of published scientific results have overstated their statistical significance and cannot be replicated. This threatens to undermine the credibility of scientific inquiry both in the scientific community and in the public eye.

In response to this, a flurry of discussion and research on how to fix the scientific process has started, especially at the level of institutions and funding agencies. The first recommended course of action is often to move toward making experimental data open and more widely accessible. In principle, this allows for easier verification of results from independent scientists. It also helps preserve initial knowledge from expensive experiments and allows for possible further discoveries in the future. 

While open data is indeed a necessity, it is not sufficient to achieve full reproducibility. This is especially true in the case of data-intensive research, which is confounded by large datasets and the computational resources needed to process them. The technical and human complexity inherent in modern fields of big data science, such as radio astronomy, genomics, and high energy physics, has made it progressively more difficult to validate and replicate results.

For example, if an independent researcher were to seek to reproduce a published result in radio astronomy, they would find it to be prohibitively expensive to replicate even a portion of the analysis pipeline. Therefore, rather than simply making results and datasets open to the public, it is necessary to design scientific analysis with reproducibility in mind from the very beginning. To make this a reality, researchers must create tools and services to streamline the way that scientists fully document, share, and preserve their full analyses pipelines before publishing a result. 

Since IDIA is pursuing advancement in infrastructure for scientific computing and analysis techniques while conducting novel scientific research, it is uniquely positioned to innovate the tools that transform the way science is conducted. Software engineers and data scientists are collaborating with researchers to test existing software and develop new solutions that preserve and share data, code, software, computing environments, workflows, documentation, and results.

The same tools that enable reusable analyses tend to also make existing scientific procedures easier and faster. In addition to encouraging preservation and transparency, the tools that encourage reproducibility also help to reduce redundancy and make it easier for scientists to collaborate and share their work. Thus, the mission of reproducibility fits seamlessly with the mission to build a modern, next-generation scientific computing environment.