Journal:Data without software are just numbers

From LIMSWiki
Revision as of 16:15, 1 October 2020 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Data without software are just numbers
Journal Data Science Journal
Author(s) Davenport, James H.; Grant, James; Jones, Catherine M.
Author affiliation(s) University of Bath, Science and Technology Facilities Council
Primary contact Email: J dot H dot Davenport at bath dot ac dot uk
Year published 2020
Volume and issue 19(1)
Article # 3
DOI 10.5334/dsj-2020-003
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2020-003/
Download https://datascience.codata.org/articles/10.5334/dsj-2020-003/galley/929/download/ (PDF)

Abstract

Great strides have been made to encourage researchers to archive data created by research and provide the necessary systems to support their storage. Additionally, it is recognized that data are meaningless unless their provenance is preserved, through appropriate metadata. Alongside this is a pressing need to ensure the quality and archiving of the software that generates data, through simulation and control of experiment or data collection, and that which analyzes, modifies, and draws value from raw data. In order to meet the aims of reproducibility, we argue that data management alone is insufficient: it must be accompanied by good software practices, the training to facilitate it, and the support of stakeholders, including appropriate recognition for software as a research output.

Keywords: software citation, software management, reproducibility, archiving, research software engineer

Introduction

Context

In the last decade, there has been a drive towards improved research data management in academia, moving away from the model of "supplementary material" that did not fit in publications, to the requirement that all data supporting research be made available at the time of publication. In the U.K., for example, the Research Councils have a Concordat on Open Research Data[1], and the E.U.’s Horizon 2020 program incorporates similar policies on data availability.[2] The FAIR principles[3]—that state data be findable, accessible, interoperable, and re-usable—embody the philosophy underlying this: data should be preserved through archiving with a persistent identifier, it should be well described with suitable metadata, and it should be done in a way that is relevant to the domain. Together with the OpenAccess movement, there has been a profound transformation in the availability of research and the data supporting it.

While this is a great stride towards transparency, it does not by itself improve the quality of research, and even what exactly transparency entails remains debated.[4] A common theme discussed in many disciplines is the need for a growing emphasis on "reproducibility."[5][6][7] This goes beyond data itself, requiring software and analysis pipelines to be published in a usable state alongside papers. In order to spread such good practices, a coordinated effort towards training in professional programming methods in academia, recognizing the role of research software and the effort required to develop it, and storing the software instance itsels as well as the data it creates and operates on.


References

  1. Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome (28 July 2016). "Concordat on Open Research Data" (PDF). https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/. 
  2. Directorate-General for Research & Innovation (26 July 2016). "Guidelines on FAIR Data Management in Horizon 2020" (PDR). H2020 Programme. European Commission. https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 12 August 2019. 
  3. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175. 
  4. Lyon, L.; Jeng, W.; Mattern, E. (2017). "Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services". International Journal of Digital Curation 12 (1): 46–64. doi:10.2218/ijdc.v12i1.530. 
  5. Chen, X.; Dallmeier-Tiessen, S.; Dasler, R. et al. (2019). "Open is not enough". Nature Physics 15: 113–19. doi:10.1038/s41567-018-0342-2. 
  6. Mesnard, O.; Barba, L.A. (2017). "Reproducible and Replicable Computational Fluid Dynamics: It’s Harder Than You Think". Computing in Science & Engineering 19 (4): 44–55. doi:10.1109/MCSE.2017.3151254. 
  7. Allison, D.B.; Shiffrin, R.M.; Stodden, V. (2018). "Reproducibility of research: Issues and proposed remedies". Proceedings of the National Academy of Sciences of the United States of America 115 (11): 2561–62. doi:10.1073/pnas.1802324115. PMC PMC5856570. PMID 29531033. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856570. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.