Difference between revisions of "User:Shawndouglas/sandbox/sublevel13"

From LIMSWiki
Jump to navigationJump to search
Line 41: Line 41:
Unsurprisingly, what researchers consider to be "research software" for purposes of FAIR has historically been interpreted numerous ways. Does the commercial spreadsheet software used to make calculations to research data deserve to be called research software in parallel with the lab-developed bioinformatics application used to generate that data? Given the difficulties of gaining a consensus definition of the term, a 2021 international initiative called FAIRsFAIR made a good-faith effort to define "research software" with the feedback of multiple stakeholders. The short version of their resulting definition is that, "[r]esearch software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose."<ref name="GruenpeterDefining21">{{Cite journal |last=Gruenpeter, Morane |last2=Katz, Daniel S. |last3=Lamprecht, Anna-Lena |last4=Honeyman, Tom |last5=Garijo, Daniel |last6=Struck, Alexander |last7=Niehues, Anna |last8=Martinez, Paula Andrea |last9=Castro, Leyla Jael |last10=Rabemanantsoa, Tovo |last11=Chue Hong, Neil P. |date=2021-09-13 |title=Defining Research Software: a controversial discussion |url=https://zenodo.org/record/5504016 |journal=Zenodo |doi=10.5281/zenodo.5504016}}</ref> Of note is the last part, acknowledging that research software can be developed in the lab during the research process or developed beforehand by, for example, a commercial software developer with a strong purpose of being used for research. As such, Microsoft Excel may not be looked upon as research software, but an ELN or [[laboratory information management system]] (LIMS) thoughtfully developed with research activities in mind could be considered research software. More often than not, that software is going to be developed in-house. A growing push for the FAIRification of that software, as well as commercial research solutions, has seen the emergence of "research software engineering" as a domain of practice.<ref name="MoynihanTheHitch20">{{cite web |url=https://invenia.github.io/blog/2020/07/07/software-engineering/ |title=The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE |author=Moynihan, G. |work=Invenia Blog |publisher=Invenia Technical Computing Corporation |date=07 July 2020}}</ref><ref name="WoolstonWhySci22">{{Cite journal |last=Woolston |first=Chris |date=2022-05-31 |title=Why science needs more research software engineers |url=https://www.nature.com/articles/d41586-022-01516-2 |journal=Nature |language=en |pages=d41586–022–01516-2 |doi=10.1038/d41586-022-01516-2 |issn=0028-0836}}</ref> While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research"<ref name="WoolstonWhySci22" /><ref name="CohenTheFour21">{{Cite journal |last=Cohen |first=Jeremy |last2=Katz |first2=Daniel S. |last3=Barker |first3=Michelle |last4=Chue Hong |first4=Neil |last5=Haines |first5=Robert |last6=Jay |first6=Caroline |date=2021-01 |title=The Four Pillars of Research Software Engineering |url=https://ieeexplore.ieee.org/document/8994167/ |journal=IEEE Software |volume=38 |issue=1 |pages=97–105 |doi=10.1109/MS.2020.2973362 |issn=0740-7459}}</ref>, with research software engineering efforts focusing on that concept as being vital to future research outcomes. Cohen ''et al.'' add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."<ref name="CohenTheFour21" />
Unsurprisingly, what researchers consider to be "research software" for purposes of FAIR has historically been interpreted numerous ways. Does the commercial spreadsheet software used to make calculations to research data deserve to be called research software in parallel with the lab-developed bioinformatics application used to generate that data? Given the difficulties of gaining a consensus definition of the term, a 2021 international initiative called FAIRsFAIR made a good-faith effort to define "research software" with the feedback of multiple stakeholders. The short version of their resulting definition is that, "[r]esearch software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose."<ref name="GruenpeterDefining21">{{Cite journal |last=Gruenpeter, Morane |last2=Katz, Daniel S. |last3=Lamprecht, Anna-Lena |last4=Honeyman, Tom |last5=Garijo, Daniel |last6=Struck, Alexander |last7=Niehues, Anna |last8=Martinez, Paula Andrea |last9=Castro, Leyla Jael |last10=Rabemanantsoa, Tovo |last11=Chue Hong, Neil P. |date=2021-09-13 |title=Defining Research Software: a controversial discussion |url=https://zenodo.org/record/5504016 |journal=Zenodo |doi=10.5281/zenodo.5504016}}</ref> Of note is the last part, acknowledging that research software can be developed in the lab during the research process or developed beforehand by, for example, a commercial software developer with a strong purpose of being used for research. As such, Microsoft Excel may not be looked upon as research software, but an ELN or [[laboratory information management system]] (LIMS) thoughtfully developed with research activities in mind could be considered research software. More often than not, that software is going to be developed in-house. A growing push for the FAIRification of that software, as well as commercial research solutions, has seen the emergence of "research software engineering" as a domain of practice.<ref name="MoynihanTheHitch20">{{cite web |url=https://invenia.github.io/blog/2020/07/07/software-engineering/ |title=The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE |author=Moynihan, G. |work=Invenia Blog |publisher=Invenia Technical Computing Corporation |date=07 July 2020}}</ref><ref name="WoolstonWhySci22">{{Cite journal |last=Woolston |first=Chris |date=2022-05-31 |title=Why science needs more research software engineers |url=https://www.nature.com/articles/d41586-022-01516-2 |journal=Nature |language=en |pages=d41586–022–01516-2 |doi=10.1038/d41586-022-01516-2 |issn=0028-0836}}</ref> While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research"<ref name="WoolstonWhySci22" /><ref name="CohenTheFour21">{{Cite journal |last=Cohen |first=Jeremy |last2=Katz |first2=Daniel S. |last3=Barker |first3=Michelle |last4=Chue Hong |first4=Neil |last5=Haines |first5=Robert |last6=Jay |first6=Caroline |date=2021-01 |title=The Four Pillars of Research Software Engineering |url=https://ieeexplore.ieee.org/document/8994167/ |journal=IEEE Software |volume=38 |issue=1 |pages=97–105 |doi=10.1109/MS.2020.2973362 |issn=0740-7459}}</ref>, with research software engineering efforts focusing on that concept as being vital to future research outcomes. Cohen ''et al.'' add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."<ref name="CohenTheFour21" />


Hasselbring ''et al.'' note that "it is essential [for academic research groups] to publish research software in addition to research data," to increase trust in the peer review system, build new research on top of existing research, and ensure greater reproducibility of any published results.<ref name="HasselbringFromFAIR20" /> They extend FAIR data principles to FAIR research software, noting that<ref name="HasselbringFromFAIR20" />:


* ''findable'' software acknowledges that "the first step in (re)using ... software is to find it";
* ''accessible'' software acknowledges that once found, the researcher needs to know how to best access the software, recognizing authentication or authentication mechanisms may need to be in place;
* ''interoperable'' software acknowledges that the software will need to eventually integrate with other research objects and software, demanding a FAIR-driven methods and tools in the software's development; and
* ''reusable'' software acknowledges that the software will need to not only produce research objects that can be reused, combined, and extended, but that the software itself should have metadata that helps make it retrievable and reusable.
The applicability of these principles is clear to academic research software developed in-house, with the concept of open science driving FAIR development and release of that software.<ref name="HasselbringFromFAIR20" /> It's less clear for commercial developers making research software. The growing prevalence of FAIR data and software practices in research laboratories doesn't mean commercial developers are going to suddenly take an open-source approach to their code, and it doesn't mean academic and institutional research labs are going to give up the benefits of the open-source paradigm as applied to research software.<ref name="HasselbringFromFAIR20" /> However, both research software development paradigms stand to gain from the shift to more FAIR data and software.<ref name="MoynihanTheHitch20" /> Additionally, if commercial vendors of research software want to continue to competitively market relevant and sustainable research software to research labs, they frankly have little choice but to commit extra resources to learning about the application of FAIR principles to their offerings tailored to those labs.
===FAIRer research objects + better software = the potential for greater innovation===


===FAIRer research objects, better software, greater innovation===


==References==
==References==

Revision as of 00:45, 8 May 2024

Sandbox begins below

[[File:|right|520px]] Title: Why are the FAIR data principles increasingly important to research laboratories and their software?

Author for citation: Shawn E. Douglas

License for content: Creative Commons Attribution-ShareAlike 4.0 International

Publication date: May 2024

Introduction

The growing importance of the FAIR principles to research laboratories

The FAIR data principles were published by Wilkinson et al. in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and information of all shapes and formats) become more universally findable, accessible, interoperable, and reusable (FAIR) by both machines and people.[1] The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."[1] Since being published, other researchers have taken the somewhat broad set of principles and refined them to their own scientific disciplines, as well as to other types of research objects, including the research software being used by those researchers to generate research objects.[2][3][4][5][6][7]

But why are research laboratories increasingly pushing for more findable, accessible, interoperable, and reusable research objects and software? The short answer, as evidenced by the Wilkinson et al. quote above is that greater innovation can be gained through improved knowledge discovery. The discovery process necessary for that greater innovation—whether through traditional research methods or artificial intelligence (AI)-driven methods—is enhanced when research objects and software are compatible with the core ideas of FAIR.[1][8][9]

A slightly longer answer, suitable for a Q&A topic, requires looking at a few more details of the FAIR principles as applied to both research objects and research software. Research laboratories, whether located in an organization or contracted out as third parties, exist to innovate. That innovation can come in the form of discovering new materials that may or may not have a future application, developing a pharmaceutical to improve patient outcomes for a particular disease, or modifying (for some sort of improvement) an existing food or beverage recipe, among others. In academic research labs, this usually looks like knowledge advancement and the publishing of research results, whereas in industry research labs, this typically looks like more practical applications of research concepts to new or existing products or services. In both cases, research software was likely involved at some point, whether it be something like a researcher-developed bioinformatics application or a commercial vendor-developed electronic laboratory notebook (ELN).

FAIR research objects

Regarding research objects themselves, the FAIR principles essentially say "vast amounts of data and information in largely heterogeneous formats spread across disparate sources both electronic and paper make modern research workflows difficult, tedious, and at times impossible. Further, repeatability, reproducibility, and replicability of openly published or secure internal research results is at risk, giving less confidence to academic peers in the published research, or less confidence to critical stakeholders in the viability of a researched prototype." As such, research objects (which include not only their inherent data and information but also any metadata that describe features of that data and information) need to be[10]:

  • findable, with globally unique and persistent identifiers, rich metadata that link to the identifier of the data described, and an ability to be indexed as an effectively searchable resource;
  • accessible, being able to be retrieved (including metadata of data that is no longer available) by identifiers using secure standardized communication protocols that are open, free, and universally implementable with authentication and authorization mechanisms;
  • interoperable, represented using formal, accessible, shared, and relevant language models and vocabularies that abide by FAIR principles, as well as with qualified linkage to other metadata; and
  • reusable, being richly described by accurate and relevant metadata, released with a clear and accessible data usage license, associated with sufficiently detailed provenance information, and compliant with discipline-specific community standards.

All that talk of unique persistent identifiers, communication protocols, authentication mechanisms, language models (e.g., ontology languages), standardized vocabularies, provenance information, and more could make one's head spin. And, to be fair, it has been challenging for research groups to adopt FAIR, with few widespread international efforts to translate the FAIR principles to broad research. The FAIR Cookbook represents one example of such international collaborative effort, providing "a combination of guidance, technical, hands-on, background and review types to cover the operation steps of FAIR data management."[11] In fact, the Cookbook is illustrative of the challenges of implementing FAIR in research laboratories, particularly given the diverse array of vocabularies used across the wealth of scientific disciplines, such as biobanking, biomedical engineering, botany, food science, and materials science. The way a botanical research organization makes its research objects FAIR is going to require a set of different tools than the materials science research organization. But all of them will turn to informatics tools, data management plans, database tools, and more to not only massage existing research objects to be FAIR but also better ensure newly created research objects are FAIR as well.

FAIR research software

Discussion on research software and its FAIRness is more complicated. It is beyond the scope of this article to go into greater detail about the concepts surrounding FAIR research software, but a brief overview will be attempted. When the FAIR principles were first published, the framework was largely being applied to research objects. However, researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts. This led to recognizing that digital research objects go beyond data and information, and that there is a "specific nature of software" used in research; that research software should not be considered "just data."[4] The end result has been seen researchers begin to apply the core concepts of FAIR to research software, but slightly differently from research objects.[2][3][4][5][6][7]

Unsurprisingly, what researchers consider to be "research software" for purposes of FAIR has historically been interpreted numerous ways. Does the commercial spreadsheet software used to make calculations to research data deserve to be called research software in parallel with the lab-developed bioinformatics application used to generate that data? Given the difficulties of gaining a consensus definition of the term, a 2021 international initiative called FAIRsFAIR made a good-faith effort to define "research software" with the feedback of multiple stakeholders. The short version of their resulting definition is that, "[r]esearch software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose."[12] Of note is the last part, acknowledging that research software can be developed in the lab during the research process or developed beforehand by, for example, a commercial software developer with a strong purpose of being used for research. As such, Microsoft Excel may not be looked upon as research software, but an ELN or laboratory information management system (LIMS) thoughtfully developed with research activities in mind could be considered research software. More often than not, that software is going to be developed in-house. A growing push for the FAIRification of that software, as well as commercial research solutions, has seen the emergence of "research software engineering" as a domain of practice.[13][14] While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research"[14][15], with research software engineering efforts focusing on that concept as being vital to future research outcomes. Cohen et al. add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."[15]

Hasselbring et al. note that "it is essential [for academic research groups] to publish research software in addition to research data," to increase trust in the peer review system, build new research on top of existing research, and ensure greater reproducibility of any published results.[3] They extend FAIR data principles to FAIR research software, noting that[3]:

  • findable software acknowledges that "the first step in (re)using ... software is to find it";
  • accessible software acknowledges that once found, the researcher needs to know how to best access the software, recognizing authentication or authentication mechanisms may need to be in place;
  • interoperable software acknowledges that the software will need to eventually integrate with other research objects and software, demanding a FAIR-driven methods and tools in the software's development; and
  • reusable software acknowledges that the software will need to not only produce research objects that can be reused, combined, and extended, but that the software itself should have metadata that helps make it retrievable and reusable.

The applicability of these principles is clear to academic research software developed in-house, with the concept of open science driving FAIR development and release of that software.[3] It's less clear for commercial developers making research software. The growing prevalence of FAIR data and software practices in research laboratories doesn't mean commercial developers are going to suddenly take an open-source approach to their code, and it doesn't mean academic and institutional research labs are going to give up the benefits of the open-source paradigm as applied to research software.[3] However, both research software development paradigms stand to gain from the shift to more FAIR data and software.[13] Additionally, if commercial vendors of research software want to continue to competitively market relevant and sustainable research software to research labs, they frankly have little choice but to commit extra resources to learning about the application of FAIR principles to their offerings tailored to those labs.

FAIRer research objects + better software = the potential for greater innovation

References

  1. 1.0 1.1 1.2 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 
  2. 2.0 2.1 "fair data principles". PubMed Search. National Institutes of Health, National Library of Medicine. https://pubmed.ncbi.nlm.nih.gov/?term=fair+data+principles. Retrieved 30 April 2024. 
  3. 3.0 3.1 3.2 3.3 3.4 3.5 Hasselbring, Wilhelm; Carr, Leslie; Hettrick, Simon; Packer, Heather; Tiropanis, Thanassis (25 February 2020). "From FAIR research data toward FAIR and open research software" (in en). it - Information Technology 62 (1): 39–47. doi:10.1515/itit-2019-0040. ISSN 2196-7032. https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html. 
  4. 4.0 4.1 4.2 Gruenpeter, M. (23 November 2020). "FAIR + Software: Decoding the principles" (PDF). FAIRsFAIR “Fostering FAIR Data Practices In Europe”. https://www.fairsfair.eu/sites/default/files/FAIR%20%2B%20software.pdf. Retrieved 30 April 2024. 
  5. 5.0 5.1 Barker, Michelle; Chue Hong, Neil P.; Katz, Daniel S.; Lamprecht, Anna-Lena; Martinez-Ortiz, Carlos; Psomopoulos, Fotis; Harrow, Jennifer; Castro, Leyla Jael et al. (14 October 2022). "Introducing the FAIR Principles for research software" (in en). Scientific Data 9 (1): 622. doi:10.1038/s41597-022-01710-x. ISSN 2052-4463. PMC PMC9562067. PMID 36241754. https://www.nature.com/articles/s41597-022-01710-x. 
  6. 6.0 6.1 Patel, Bhavesh; Soundarajan, Sanjay; Ménager, Hervé; Hu, Zicheng (23 August 2023). "Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool" (in en). Scientific Data 10 (1): 557. doi:10.1038/s41597-023-02463-x. ISSN 2052-4463. PMC PMC10447492. PMID 37612312. https://www.nature.com/articles/s41597-023-02463-x. 
  7. 7.0 7.1 Du, Xinsong; Dastmalchi, Farhad; Ye, Hao; Garrett, Timothy J.; Diller, Matthew A.; Liu, Mei; Hogan, William R.; Brochhausen, Mathias et al. (6 February 2023). "Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software" (in en). Metabolomics 19 (2): 11. doi:10.1007/s11306-023-01974-3. ISSN 1573-3890. https://link.springer.com/10.1007/s11306-023-01974-3. 
  8. Olsen, C. (1 September 2023). "Embracing FAIR Data on the Path to AI-Readiness". Pharma's Almanac. https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness. Retrieved 03 May 2024. 
  9. Huerta, E. A.; Blaiszik, Ben; Brinson, L. Catherine; Bouchard, Kristofer E.; Diaz, Daniel; Doglioni, Caterina; Duarte, Javier M.; Emani, Murali et al. (26 July 2023). "FAIR for AI: An interdisciplinary and international community building perspective" (in en). Scientific Data 10 (1): 487. doi:10.1038/s41597-023-02298-6. ISSN 2052-4463. PMC PMC10372139. PMID 37495591. https://www.nature.com/articles/s41597-023-02298-6. 
  10. Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Gu, Wei; Welter, Danielle; Abbassi Daloii, Tooba; Portell-Silva, Laura (30 June 2022). "Introducing the FAIR Principles". D2.1 FAIR Cookbook. doi:10.5281/ZENODO.6783564. https://zenodo.org/record/6783564. 
  11. Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Gu, Wei; Welter, Danielle; Abbassi Daloii, Tooba; Portell-Silva, Laura (30 June 2022). "Introduction". D2.1 FAIR Cookbook. doi:10.5281/ZENODO.6783564. https://zenodo.org/record/6783564. 
  12. Gruenpeter, Morane; Katz, Daniel S.; Lamprecht, Anna-Lena; Honeyman, Tom; Garijo, Daniel; Struck, Alexander; Niehues, Anna; Martinez, Paula Andrea et al. (13 September 2021). "Defining Research Software: a controversial discussion". Zenodo. doi:10.5281/zenodo.5504016. https://zenodo.org/record/5504016. 
  13. 13.0 13.1 Moynihan, G. (7 July 2020). "The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE". Invenia Blog. Invenia Technical Computing Corporation. https://invenia.github.io/blog/2020/07/07/software-engineering/. 
  14. 14.0 14.1 Woolston, Chris (31 May 2022). "Why science needs more research software engineers" (in en). Nature: d41586–022–01516-2. doi:10.1038/d41586-022-01516-2. ISSN 0028-0836. https://www.nature.com/articles/d41586-022-01516-2. 
  15. 15.0 15.1 Cohen, Jeremy; Katz, Daniel S.; Barker, Michelle; Chue Hong, Neil; Haines, Robert; Jay, Caroline (1 January 2021). "The Four Pillars of Research Software Engineering". IEEE Software 38 (1): 97–105. doi:10.1109/MS.2020.2973362. ISSN 0740-7459. https://ieeexplore.ieee.org/document/8994167/.