Difference between revisions of "Journal:Data without software are just numbers"

Full article title	Data without software are just numbers
Journal	Data Science Journal
Author(s)	Davenport, James H.; Grant, James; Jones, Catherine M.
Author affiliation(s)	University of Bath, Science and Technology Facilities Council
Primary contact	Email: J dot H dot Davenport at bath dot ac dot uk
Year published	2020
Volume and issue	19(1)
Article #	3
DOI	10.5334/dsj-2020-003
ISSN	1683-1470
Distribution license	Creative Commons Attribution 4.0 International
Website	https://datascience.codata.org/articles/10.5334/dsj-2020-003/
Download	https://datascience.codata.org/articles/10.5334/dsj-2020-003/galley/929/download/ (PDF)

Revision as of 17:16, 1 October 2020

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

Abstract

Great strides have been made to encourage researchers to archive data created by research and provide the necessary systems to support their storage. Additionally, it is recognized that data are meaningless unless their provenance is preserved, through appropriate metadata. Alongside this is a pressing need to ensure the quality and archiving of the software that generates data, through simulation and control of experiment or data collection, and that which analyzes, modifies, and draws value from raw data. In order to meet the aims of reproducibility, we argue that data management alone is insufficient: it must be accompanied by good software practices, the training to facilitate it, and the support of stakeholders, including appropriate recognition for software as a research output.

Keywords: software citation, software management, reproducibility, archiving, research software engineer

Introduction

In the last decade, there has been a drive towards improved research data management in academia, moving away from the model of "supplementary material" that did not fit in publications, to the requirement that all data supporting research be made available at the time of publication. In the U.K., for example, the Research Councils have a Concordat on Open Research Data^[1], and the E.U.’s Horizon 2020 program incorporates similar policies on data availability.^[2] The FAIR principles^[3]—that state data be findable, accessible, interoperable, and re-usable—embody the philosophy underlying this: data should be preserved through archiving with a persistent identifier, it should be well described with suitable metadata, and it should be done in a way that is relevant to the domain. Together with the OpenAccess movement, there has been a profound transformation in the availability of research and the data supporting it.

While this is a great stride towards transparency, it does not by itself improve the quality of research, and even what exactly transparency entails remains debated.^[4] A common theme discussed in many disciplines is the need for a growing emphasis on "reproducibility."^[5]^[6]^[7] This goes beyond data itself, requiring software and analysis pipelines to be published in a usable state alongside papers. In order to spread such good practices, a coordinated effort towards training in professional programming methods in academia, recognizing the role of research software and the effort required to develop it, and storing the software instance itsels as well as the data it creates and operates on.

In the next section of this article we next discuss two cases where the use of spreadsheets highlights the need for programmatic approaches to analysis, then in the subsequent section we review the research software engineer movement, which now has nascent organizations internationally. While some domains are adopting and at the forefront of developing good practices, the sector-wide approaches needed to support their uptake generally are lacking; we discuss this issue in the penultimate section. We finally close by summarizing how data librarians and research software engineers need to work with researchers to continue to improve the situation.

When analysis "goes wrong"

The movement towards reproducible research is driven by the belief that reviewers and readers should be able to verify and readily validate the analysis workflows supporting publications. Rather than being viewed as questioning academic rigor, this concept should be embraced as a vital part of the research cycle. Here we discuss two examples which illustrate how oversights can cause issues, which ultimately should be avoidable.

How not to Excel ... at economics

Reinhart and Rogoff’s now notorious 2010 paper showed a headline figure of a 0.1% contraction for economies with >90% debt.^[8] A number of issues with their work are raised by Herndon, Ash, and Pollin^[9], who were unable to reproduce the results—despite the raw data being published—since Reinhart & Rogoff's method was not fully described. Further, when the spreadsheet used for the calculation was analyzed it was found that five countries (Australia, Austria, Belgium, Canada, and Denmark) had been incorrectly omitted from the analysis. Together with methodological issues, the revised analysis showed a 2.2% growth.

The mistakes received particular attention, with numerous article published on the topic (e.g., Borwein & Bailey's 2013 article^[10]), since the original paper was used to justify austerity policies aimed at cutting debt, in the U.S., U.K., and E.U., as well as within the Inernational Monetary Fund (IMF). The reliance of the proponents of these policies—and their economic and geopolitical results—on a flawed analysis should act as a stark warning that all researchers need to mitigate against error and embrace transparency.

How not to Excel ... with genes

When files are opened in Microsoft Excel, the default behaviour is to infer data types, but while this may benefit general users, it is not always helpful. For example, two gene symbols, SEPT2 and MARCH1, are converted into dates, while certain identifiers (e.g., 2310009E13) are converted to floating point numbers. Although this has been known since 2004, a 2016 study by Ziemann, Eren, and El-Osta (2016) found that the issue continues to affect papers, as identified through supplementary data. Numbers have typically increased year-on-year, with 20% of papers affected on average, rising to over 30% in Nature. This problem continues to occur despite the problem being sufficiently mature and pervasive, so much so (and despite the fact) that a service has been developed to identify affected spreadsheets.^[11]

Research software

While we stress that non-programmatic approaches such as the use of spreadsheets do not of themselves cause errors, it does compromise the ability to test and reproduce analysis workflows. Further, the publication of software is part of a wider program of transparency and open access.^[12] However, if these relatively simple issues occur, we must find ways of identifying and avoiding all problems with data analysis, data collection, and experiment operation. If it also makes deliberately obfuscated methods easier to identify and discuss with authors at review.

Increasingly, research across disciplines depends upon software, used for experimental control or instrumentation, simulating models or analysis, and turning numbers into figures. It is vital that bespoke software is published alongside the journal article and the data it supports. While it doesn’t ensure that code is correct, it does enable the reproducibility of analysis and allows experimental workflows to be checked and validated against correct or "expected" behavior. Making code available and employing good practice in its development should be the default, whether it be a million lines of community code or a short analysis script.

The Research Software Engineer movement grew out of a working group of the Software Sustainability Institute^[13] (SSI), which has since been a strong supporter of the U.K. Research Software Engineer Association (UKRSEA), now known as the Society of Research Software Engineering (RSE).^[14] The aim has been to improve the sustainability, quality, and recognition of research software by advocating good software practice (see, e.g., Wilson et al.^[15]) and career progression for its developers. Its work has resulted in recognition of the role by funders and fellowship schemes, as well as growing recognition of software as a vital part of e-infrastructure. Its success has spawned sister organizations internationally in Germany, Netherlands, Scandanavia, and the U.S.

A 2014 survey by the SSI showed that 92% of researchers used research software, and that 69% would not be able to conduct their research without it.^[16] Research software was defined as that used to generate, process, or analyze results for publication. Furthermore, 56% of researchers developed software, of whom 21% had never received any form of software training. It is clear that software underpins modern research and that many researchers are involved in development, even if it is not their primary activity.

Programmatic approaches to analysis and plotting allow for greater transparency, deliver efficiencies for researchers in academia, and, with formal training, improve employability in industry. Their adoption is further motivated by the requirements of funders and journals, which increasingly require, or at least encourage (see, e.g. the Associate for Computing Machinery^[17]), publication of software. This evolving landscape requires a rapid and connected response from researchers, data managers, and research software engineers if institutions are to improve software development practices in a sustainable way.

References

↑ Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome (28 July 2016). "Concordat on Open Research Data" (PDF). https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/.
↑ Directorate-General for Research & Innovation (26 July 2016). "Guidelines on FAIR Data Management in Horizon 2020" (PDR). H2020 Programme. European Commission. https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 12 August 2019.
↑ Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175.
↑ Lyon, L.; Jeng, W.; Mattern, E. (2017). "Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services". International Journal of Digital Curation 12 (1): 46–64. doi:10.2218/ijdc.v12i1.530.
↑ Chen, X.; Dallmeier-Tiessen, S.; Dasler, R. et al. (2019). "Open is not enough". Nature Physics 15: 113–19. doi:10.1038/s41567-018-0342-2.
↑ Mesnard, O.; Barba, L.A. (2017). "Reproducible and Replicable Computational Fluid Dynamics: It’s Harder Than You Think". Computing in Science & Engineering 19 (4): 44–55. doi:10.1109/MCSE.2017.3151254.
↑ Allison, D.B.; Shiffrin, R.M.; Stodden, V. (2018). "Reproducibility of research: Issues and proposed remedies". Proceedings of the National Academy of Sciences of the United States of America 115 (11): 2561–62. doi:10.1073/pnas.1802324115. PMC PMC5856570. PMID 29531033. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856570.
↑ Reinhart, C.M.; Rogoff, K.S. (2010). "Growth in a Time of Debt". American Economic Review 100 (2): 573–78. doi:10.1257/aer.100.2.573.
↑ Herndon, T.; Ash, M.; Pollin, R. (2013). "Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff". Cambridge Journal of Economics 38 (2): 257–279. doi:10.1093/cje/bet075.
↑ Borwein, J.; Bailey, D.H. (22 April 2020). "The Reinhart-Rogoff error – or how not to Excel at economics". The Conversation. https://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646.
↑ Mallona, I.; Peinado, M.A. (2018). "Truke, a web tool to check for and handle excel misidentified gene symbols". BMC Genomics 18 (1): 242. doi:10.1186/s12864-017-3631-8. PMC PMC5359807. PMID 28327106. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5359807.
↑ Munafò, M.R.; Nosek, B.A.; Bishop, D.V.M. et al. (2017). "A manifesto for reproducible science". Nature Human Behaviour 1: 0021. doi:10.1038/s41562-016-0021.
↑ Software Sustainability Institute. "Software Sustainability Institute". https://www.software.ac.uk/.
↑ Society of Research Software Engineering. "RSE Society of Research Software Engineering". http://rse.ac.uk/.
↑ Wilson, G.; Bryan, J.; Cranston, K. et al. (2017). "Good enough practices in scientific computing". PLoS Computational Biology 13 (6): e1005510. doi:10.1371/journal.pcbi.1005510. PMC PMC5480810. PMID 28640806. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5480810.
↑ Hettrick, S.; Antronioletti, M.; Carr, L. et al. (2014). "UK Research Software Survey 2014". Zenodo. doi:10.5281/zenodo.14809.
↑ Association for Computing Machinery (2018). "Software and Data Artifacts in the ACM Digital Library". https://www.acm.org/publications/artifacts.

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.

[RCUKConcord16-1] Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome (28 July 2016). "Concordat on Open Research Data" (PDF). https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/.

[ECGuidelines16-2] Directorate-General for Research & Innovation (26 July 2016). "Guidelines on FAIR Data Management in Horizon 2020" (PDR). H2020 Programme. European Commission. https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 12 August 2019.

[WilkinsonTheFAIR16-3] Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175.

[LyonResearch17-4] Lyon, L.; Jeng, W.; Mattern, E. (2017). "Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services". International Journal of Digital Curation 12 (1): 46–64. doi:10.2218/ijdc.v12i1.530.

[ChenOpen19-5] Chen, X.; Dallmeier-Tiessen, S.; Dasler, R. et al. (2019). "Open is not enough". Nature Physics 15: 113–19. doi:10.1038/s41567-018-0342-2.

[MesnardRepro17-6] Mesnard, O.; Barba, L.A. (2017). "Reproducible and Replicable Computational Fluid Dynamics: It’s Harder Than You Think". Computing in Science & Engineering 19 (4): 44–55. doi:10.1109/MCSE.2017.3151254.

[AllisonRepro18-7] Allison, D.B.; Shiffrin, R.M.; Stodden, V. (2018). "Reproducibility of research: Issues and proposed remedies". Proceedings of the National Academy of Sciences of the United States of America 115 (11): 2561–62. doi:10.1073/pnas.1802324115. PMC PMC5856570. PMID 29531033. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856570.

[ReinhartGrowth10-8] Reinhart, C.M.; Rogoff, K.S. (2010). "Growth in a Time of Debt". American Economic Review 100 (2): 573–78. doi:10.1257/aer.100.2.573.

[HerndonDoes13-9] Herndon, T.; Ash, M.; Pollin, R. (2013). "Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff". Cambridge Journal of Economics 38 (2): 257–279. doi:10.1093/cje/bet075.

[BorweinTheRein13-10] Borwein, J.; Bailey, D.H. (22 April 2020). "The Reinhart-Rogoff error – or how not to Excel at economics". The Conversation. https://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646.

[MallonaTruke17-11] Mallona, I.; Peinado, M.A. (2018). "Truke, a web tool to check for and handle excel misidentified gene symbols". BMC Genomics 18 (1): 242. doi:10.1186/s12864-017-3631-8. PMC PMC5359807. PMID 28327106. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5359807.

[Munaf.C3.B2AMan17-12] Munafò, M.R.; Nosek, B.A.; Bishop, D.V.M. et al. (2017). "A manifesto for reproducible science". Nature Human Behaviour 1: 0021. doi:10.1038/s41562-016-0021.

[SSISoftware-13] Software Sustainability Institute. "Software Sustainability Institute". https://www.software.ac.uk/.

[RSESociety-14] Society of Research Software Engineering. "RSE Society of Research Software Engineering". http://rse.ac.uk/.

[WilsonGood17-15] Wilson, G.; Bryan, J.; Cranston, K. et al. (2017). "Good enough practices in scientific computing". PLoS Computational Biology 13 (6): e1005510. doi:10.1371/journal.pcbi.1005510. PMC PMC5480810. PMID 28640806. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5480810.

[HettrickUKRes14-16] Hettrick, S.; Antronioletti, M.; Carr, L. et al. (2014). "UK Research Software Survey 2014". Zenodo. doi:10.5281/zenodo.14809.

[ACMSoftware18-17] Association for Computing Machinery (2018). "Software and Data Artifacts in the ACM Digital Library". https://www.acm.org/publications/artifacts.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

@@ Line 31: / Line 31: @@
 ==Introduction==
-===Context===
 In the last decade, there has been a drive towards improved research [[Information management|data management]] in academia, moving away from the model of "supplementary material" that did not fit in publications, to the requirement that all data supporting research be made available at the time of publication. In the U.K., for example, the Research Councils have a ''Concordat on Open Research Data''<ref name="RCUKConcord16">{{cite web |url=https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/ |format=PDF |title=Concordat on Open Research Data |author=Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome |date=28 July 2016}}</ref>, and the E.U.’s Horizon 2020 program incorporates similar policies on data availability.<ref name="ECGuidelines16">{{cite web |url=https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf |format=PDR |title=Guidelines on FAIR Data Management in Horizon 2020 |work=H2020 Programme |author=Directorate-General for Research & Innovation |publisher=European Commission |date=26 July 2016 |accessdate=12 August 2019}}</ref> The FAIR principles<ref name="WilkinsonTheFAIR16">{{cite journal |title=The FAIR Guiding Principles for scientific data management and stewardship |journal=Scientific Data |author=Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. |volume=3 |pages=160018 |year=2016 |doi=10.1038/sdata.2016.18 |pmid=26978244 |pmc=PMC4792175}}</ref>—that state data be findable, accessible, interoperable, and re-usable—embody the philosophy underlying this: data should be preserved through [[Archival informatics|archiving]] with a persistent identifier, it should be well described with suitable [[metadata]], and it should be done in a way that is relevant to the domain. Together with the OpenAccess movement, there has been a profound transformation in the availability of research and the data supporting it.
 While this is a great stride towards transparency, it does not by itself improve the quality of research, and even what exactly transparency entails remains debated.<ref name="LyonResearch17">{{cite journal |title=Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services |journal=International Journal of Digital Curation |author=Lyon, L.; Jeng, W.; Mattern, E. |volume=12 |issue=1 |pages=46–64 |year=2017 |doi=10.2218/ijdc.v12i1.530}}</ref> A common theme discussed in many disciplines is the need for a growing emphasis on "reproducibility."<ref name="ChenOpen19">{{cite journal |title=Open is not enough |journal=Nature Physics |author=Chen, X.; Dallmeier-Tiessen, S.; Dasler, R. et al. |volume=15 |pages=113–19 |year=2019 |doi=10.1038/s41567-018-0342-2}}</ref><ref name="MesnardRepro17">{{cite journal |title=Reproducible and Replicable Computational Fluid Dynamics: It’s Harder Than You Think |journal=Computing in Science & Engineering |author=Mesnard, O.; Barba, L.A. |volume=19 |issue=4 |pages=44–55 |year=2017 |doi=10.1109/MCSE.2017.3151254}}</ref><ref name="AllisonRepro18">{{cite journal |title=Reproducibility of research: Issues and proposed remedies |journal=Proceedings of the National Academy of Sciences of the United States of America |author=Allison, D.B.; Shiffrin, R.M.; Stodden, V. |volume=115 |issue=11 |pages=2561–62 |year=2018 |doi=10.1073/pnas.1802324115 |pmid=29531033 |pmc=PMC5856570}}</ref> This goes beyond data itself, requiring software and [[Data analysis|analysis]] pipelines to be published in a usable state alongside papers. In order to spread such good practices, a coordinated effort towards training in professional programming methods in academia, recognizing the role of research software and the effort required to develop it, and storing the software instance itsels as well as the data it creates and operates on.
+In the next section of this article we next discuss two cases where the use of spreadsheets highlights the need for programmatic approaches to analysis, then in the subsequent section we review the research software engineer movement, which now has nascent organizations internationally. While some domains are adopting and at the forefront of developing good practices, the sector-wide approaches needed to support their uptake generally are lacking; we discuss this issue in the penultimate section. We finally close by summarizing how data librarians and research software engineers need to work with researchers to continue to improve the situation.
+==When analysis "goes wrong"==
+The movement towards reproducible research is driven by the belief that reviewers and readers should be able to verify and readily validate the analysis workflows supporting publications. Rather than being viewed as questioning academic rigor, this concept should be embraced as a vital part of the research cycle. Here we discuss two examples which illustrate how oversights can cause issues, which ultimately should be avoidable.
+===How not to Excel ... at economics===
+Reinhart and Rogoff’s now notorious 2010 paper showed a headline figure of a 0.1% contraction for economies with >90% debt.<ref name="ReinhartGrowth10">{{cite journal |title=Growth in a Time of Debt |journal=American Economic Review |author=Reinhart, C.M.; Rogoff, K.S. |volume=100 |issue=2 |pages=573–78 |year=2010 |doi=10.1257/aer.100.2.573}}</ref> A number of issues with their work are raised by Herndon, Ash, and Pollin<ref name="HerndonDoes13">{{cite journal |title=Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff |journal=Cambridge Journal of Economics |author=Herndon, T.; Ash, M.; Pollin, R. |volume=38 |issue=2 |pages=257–279 |year=2013 |doi=10.1093/cje/bet075}}</ref>, who were unable to reproduce the results—despite the raw data being published—since Reinhart & Rogoff's method was not fully described. Further, when the spreadsheet used for the calculation was analyzed it was found that five countries (Australia, Austria, Belgium, Canada, and Denmark) had been incorrectly omitted from the analysis. Together with methodological issues, the revised analysis showed a 2.2% growth.
+The mistakes received particular attention, with numerous article published on the topic (e.g., Borwein & Bailey's 2013 article<ref name="BorweinTheRein13">{{cite web |url=https://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646 |title=The Reinhart-Rogoff error – or how not to Excel at economics |work=The Conversation |author=Borwein, J.; Bailey, D.H. |date=22 April 2020}}</ref>), since the original paper was used to justify austerity policies aimed at cutting debt, in the U.S., U.K., and E.U., as well as within the Inernational Monetary Fund (IMF). The reliance of the proponents of these policies—and their economic and geopolitical results—on a flawed analysis should act as a stark warning that all researchers need to mitigate against error and embrace transparency.
+===How not to Excel ... with genes===
+When files are opened in Microsoft Excel, the default behaviour is to infer data types, but while this may benefit general users, it is not always helpful. For example, two gene symbols, SEPT2 and MARCH1, are converted into dates, while certain identifiers (e.g., 2310009E13) are converted to floating point numbers. Although this has been known since 2004, a 2016 study by Ziemann, Eren, and El-Osta (2016) found that the issue continues to affect papers, as identified through supplementary data. Numbers have typically increased year-on-year, with 20% of papers affected on average, rising to over 30% in ''Nature''. This problem continues to occur despite the problem being sufficiently mature and pervasive, so much so (and despite the fact) that a service has been developed to identify affected spreadsheets.<ref name="MallonaTruke17">{{cite journal |title=Truke, a web tool to check for and handle excel misidentified gene symbols |journal=BMC Genomics |author=Mallona, I.; Peinado, M.A. |volume=18 |issue=1 |at=242 |year=2018 |doi=10.1186/s12864-017-3631-8 |pmid=28327106 |pmc=PMC5359807}}</ref>
+==Research software==
+While we stress that non-programmatic approaches such as the use of spreadsheets do not of themselves cause errors, it does compromise the ability to test and reproduce analysis workflows. Further, the publication of software is part of a wider program of transparency and open access.<ref name="MunafòAMan17">{{cite journal |title=A manifesto for reproducible science |journal=Nature Human Behaviour |author=Munafò, M.R.; Nosek, B.A.; Bishop, D.V.M. et al. |volume=1 |at=0021 |year=2017 |doi=10.1038/s41562-016-0021}}</ref> However, if these relatively simple issues occur, we must find ways of identifying and avoiding all problems with data analysis, data collection, and experiment operation. If it also makes deliberately obfuscated methods easier to identify and discuss with authors at review.
+Increasingly, research across disciplines depends upon software, used for experimental control or instrumentation, simulating models or analysis, and turning numbers into figures. It is vital that bespoke software is published alongside the journal article and the data it supports. While it doesn’t ensure that code is correct, it does enable the reproducibility of analysis and allows experimental workflows to be checked and validated against correct or "expected" behavior. Making code available and employing good practice in its development should be the default, whether it be a million lines of community code or a short analysis script.
+The Research Software Engineer movement grew out of a working group of the Software Sustainability Institute<ref name="SSISoftware">{{cite web |url=https://www.software.ac.uk/ |title=Software Sustainability Institute |author=Software Sustainability Institute}}</ref> (SSI), which has since been a strong supporter of the U.K. Research Software Engineer Association (UKRSEA), now known as the Society of Research Software Engineering (RSE).<ref name="RSESociety">{{cite web |url=http://rse.ac.uk/ |title=RSE Society of Research Software Engineering |author=Society of Research Software Engineering}}</ref> The aim has been to improve the sustainability, quality, and recognition of research software by advocating good software practice (see, e.g., Wilson et al.<ref name="WilsonGood17">{{cite journal |title=Good enough practices in scientific computing |journal=PLoS Computational Biology |author=Wilson, G.; Bryan, J.; Cranston, K. et al. |volume=13 |issue=6 |at=e1005510 |year=2017 |doi=10.1371/journal.pcbi.1005510 |pmid=28640806 |pmc=PMC5480810}}</ref>) and career progression for its developers. Its work has resulted in recognition of the role by funders and fellowship schemes, as well as growing recognition of software as a vital part of e-infrastructure. Its success has spawned sister organizations internationally in Germany, Netherlands, Scandanavia, and the U.S.
+A 2014 survey by the SSI showed that 92% of researchers used research software, and that 69% would not be able to conduct their research without it.<ref name="HettrickUKRes14">{{cite journal |title=UK Research Software Survey 2014 |journal=Zenodo |author=Hettrick, S.; Antronioletti, M.; Carr, L. et al. |year=2014 |doi=10.5281/zenodo.14809}}</ref> Research software was defined as that used to generate, process, or analyze results for publication. Furthermore, 56% of researchers developed software, of whom 21% had never received any form of software training. It is clear that software underpins modern research and that many researchers are involved in development, even if it is not their primary activity.
+Programmatic approaches to analysis and plotting allow for greater transparency, deliver efficiencies for researchers in academia, and, with formal training, improve employability in industry. Their adoption is further motivated by the requirements of funders and journals, which increasingly require, or at least encourage (see, e.g. the Associate for Computing Machinery<ref name="ACMSoftware18">{{cite web |url=https://www.acm.org/publications/artifacts |title=Software and Data Artifacts in the ACM Digital Library |author=Association for Computing Machinery |date=2018}}</ref>), publication of software. This evolving landscape requires a rapid and connected response from researchers, data managers, and research software engineers if institutions are to improve software development practices in a sustainable way.

Difference between revisions of "Journal:Data without software are just numbers"

Revision as of 17:16, 1 October 2020

Contents

Abstract

Introduction

When analysis "goes wrong"

How not to Excel ... at economics

How not to Excel ... with genes

Research software

References

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Popular publications

Print/export