Difference between revisions of "User:Shawndouglas/sandbox/sublevel13"

From LIMSWiki
Jump to navigationJump to search
Tag: Reverted
(24 intermediate revisions by the same user not shown)
Line 8: Line 8:
==Sandbox begins below==
==Sandbox begins below==
<div class="nonumtoc">__TOC__</div>
<div class="nonumtoc">__TOC__</div>
[[File:|right|520px]]
'''Title''': ''Why are the FAIR data principles increasingly important to research laboratories and their software?''


==1. What is cloud computing?==
'''Author for citation''': Shawn E. Douglas
[[File:1000px-Cloud computing.svg.png|right|650px|thumb|'''Figure 1.''' A basic visualization of [[cloud computing]] architecture layers and some of the activities that take place on those layers.]]
If you were alive in the late 2000s and doing most anything related to computers and the internet, you were bound to encounter the latest internet buzzword: [[cloud computing]].<ref name="PogueInSync08">{{cite web |url=https://www.nytimes.com/2008/07/17/technology/personaltech/17pogue.html |archiveurl=https://web.archive.org/web/20180105205750/https://www.nytimes.com/2008/07/17/technology/personaltech/17pogue.html |title=In Sync to Pierce the Cloud |author=Pogue, D. |work=The New York Times |date=17 July 2008 |archivedate=05 January 2018 |accessdate=28 July 2023}}</ref><ref name="WangCloud10">{{cite journal |title=Cloud Computing: A Perspective Study |journal=New Generation Computing |author=Wang, L.; von Laszewski, G.; Younge, A. et al. |volume=28 |pages=137–46 |year=2010 |doi=10.1007/s00354-008-0081-5 |url=https://scholarworks.rit.edu/cgi/viewcontent.cgi?article=1748&context=other}}</ref> A certain mysticism was seemingly attached to the concept, that your files and applications could reside on the internet, "out there in the 'cloud.'"<ref name="PogueInSync08" /> "But what is this 'cloud'?" many would ask. A plethora of media articles, journal articles, blogs, and company websites were published to give practically everyone's take on what the cloud was and wasn't meant to be.<ref name="ChamberlinCloud08">{{cite web |url=https://www.billchamberlin.com/cloud-computing-what-is-it/ |title=Cloud Computing: What is it? |author=Chamberlin, B. |work=BillChamberlin.com |date=28 October 2008 |accessdate=28 July 2023}}</ref> However, the then growing consensus of cloud computing as networked and scalable architecture meant to rapidly provide application and infrastructure services at reasonable prices to internet users<ref name="WangCloud10" /><ref name="ChamberlinCloud08" /> largely matches up with today's definition. Pulling from both The Institution of Engineering and Technology<ref name="FrenchCloud21">{{cite web |url=https://www.theiet.org/publishing/inspec/researching-hot-topics/cloud-computing-and-web-services/ |title=Cloud computing and web services |author=French, J. |publisher=The Institution of Engineering and Technology |date=2021 |accessdate=28 July 2023}}</ref> and Amazon Web Services<ref name="AWSCloud21">{{cite web |url=https://aws.amazon.com/what-is/cloud-native/ |title=What Is Cloud Native? |publisher=Amazon Web Services |accessdate=28 July 2023}}</ref>, we come up with cloud computing as:


<blockquote>an internet-based computing paradigm in which standardized and [[Virtualization|virtualized]] resources are used to rapidly, elastically, and cost-effectively provide a variety of globally available, "always-on" computing services to users on a continuous or as-needed basis</blockquote>
'''License for content''': [https://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International]


Of course, those computing services come in a variety of flavors, the most common being [[Software as a service|software]], [[Platform as a service|platform]], and [[Infrastructure as a service|infrastructure]] "as a service" (SaaS, PaaS, and IaaS, respectively). These conveniently correspond to the underlying architectural layers of the services, with infrastructure at the base, platform on top of that, and software (or application) on top of that.
'''Publication date''': May 2024


Figure 1 portrays a simplified visualization of cloud computing architecture layers, as well as examples of activities that happen on those layers. This concept has also been visualized by others using pyramids and pancake stacks of layers, but the concept remains the same. At the base is the computing infrastructure, including the physical data centers and their networking equipment, servers, [[hypervisor]]s, [[application programming interface]]s (APIs), and operating systems. This infrastructure is the foundation that supports not only applications users want to run but also that acts as the developmental foundation of users not wanting to implement their own infrastructure. On top of all that can be found platforms or [[middleware]], which serve as software development and deployment environments (that include databases, web servers, load balancers, etc.) or connectivity tools for analytics, [[workflow]] management, system integration, and security management. And on top of that are applications, typically designed to run optimally in cloud environments and accessed via web browsers or apps using internet—i.e. networking—connectivity and computing devices.<ref name="MaurerCloud20">{{cite web |url=https://carnegieendowment.org/2020/08/31/cloud-security-primer-for-policymakers-pub-82597 |title=Cloud Security: A Primer for Policymakers |author=Maurer, T.; Hinck, G. |publisher=Carnegie Endowment for International Peace |date=31 August 2020 |accessdate=28 July 2023}}</ref>
==Introduction==


Customers who require application hosting, internet-hosted software development platforms, or underlying computing infrastructure (e.g., data storage, computational time, etc.)—particularly when they can't or don't want to invest in their own hardware—are increasingly turning to the cloud computing paradigm. Even before a worldwide [[COVID-19]] [[pandemic]] started to take shape in late 2019, the global cloud services market was expected to reach $266.4 billion by the end of 2020, with Gartner expecting that to represent a 17 percent increase from 2019.<ref name="CostelloFore19">{{cite web |url=https://www.gartner.com/en/newsroom/press-releases/2019-11-13-gartner-forecasts-worldwide-public-cloud-revenue-to-grow-17-percent-in-2020 |title=Gartner Forecasts Worldwide Public Cloud Revenue to Grow 17% in 2020 |author=Costello, K.; Rimol, M. |publisher=Gartner |date=13 November 2020 |accessdate=28 July 2023}}</ref> As work-from-home practices expanded significantly in 2020 due to the pandemic, expectations that the trend would last post-pandemic pushed estimates of overall cloud-based workloads moving from physical work offices to the cloud to 55 percent by 2022, with the cloud services market reaching $600 billion in 2023<ref name="GartnerFore23">{{cite web |url=https://www.gartner.com/en/newsroom/press-releases/2023-04-19-gartner-forecasts-worldwide-public-cloud-end-user-spending-to-reach-nearly-600-billion-in-2023 |title=Gartner Forecasts Worldwide Public Cloud End-User Spending to Reach Nearly $600 Billion in 2023 |publisher=Gartner |date=19 April 2023 |accessdate=14 August 2023}}</ref> and $1 trillion by 2030.<ref name="ReinickeThree20">{{cite web |url=https://markets.businessinsider.com/news/stocks/wedbush-reasons-own-cloud-stocks-coronavirus-pandemic-tech-buy-2020-3-1029045273#2-the-move-to-cloud-will-accelerate-more-quickly-amid-the-coronavirus-pandemic2 |title=3 reasons one Wall Street firm says to stick with cloud stocks amid the coronavirus-induced market rout |author=Reinicke, C. |work=Market Insider |date=30 March 2020 |accessdate=28 July 2023}}</ref> This growing migration to cloud computing has many implications for organizations of all types, including [[Laboratory|laboratories]].
==The growing importance of the FAIR principles to research laboratories==
The [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR data principles]] were published by Wilkinson ''et al.'' in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and [[information]] of all shapes and formats) become more universally findable, accessible, interoperable, and reusable (FAIR) by both machines and people.<ref name="WilkinsonTheFAIR16">{{Cite journal |last=Wilkinson |first=Mark D. |last2=Dumontier |first2=Michel |last3=Aalbersberg |first3=IJsbrand Jan |last4=Appleton |first4=Gabrielle |last5=Axton |first5=Myles |last6=Baak |first6=Arie |last7=Blomberg |first7=Niklas |last8=Boiten |first8=Jan-Willem |last9=da Silva Santos |first9=Luiz Bonino |last10=Bourne |first10=Philip E. |last11=Bouwman |first11=Jildau |date=2016-03-15 |title=The FAIR Guiding Principles for scientific data management and stewardship |url=https://www.nature.com/articles/sdata201618 |journal=Scientific Data |language=en |volume=3 |issue=1 |pages=160018 |doi=10.1038/sdata.2016.18 |issn=2052-4463 |pmc=PMC4792175 |pmid=26978244}}</ref> The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."<ref name="WilkinsonTheFAIR16" /> Since being published, other researchers have taken the somewhat broad set of principles and refined them to their own scientific disciplines, as well as to other types of research objects, including the research software being used by those researchers to generate research objects.<ref name="NIHPubMedSearch">{{cite web |url=https://pubmed.ncbi.nlm.nih.gov/?term=fair+data+principles |title=fair data principles |work=PubMed Search |publisher=National Institutes of Health, National Library of Medicine |accessdate=30 April 2024}}</ref><ref name="HasselbringFromFAIR20">{{Cite journal |last=Hasselbring |first=Wilhelm |last2=Carr |first2=Leslie |last3=Hettrick |first3=Simon |last4=Packer |first4=Heather |last5=Tiropanis |first5=Thanassis |date=2020-02-25 |title=From FAIR research data toward FAIR and open research software |url=https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html |journal=it - Information Technology |language=en |volume=62 |issue=1 |pages=39–47 |doi=10.1515/itit-2019-0040 |issn=2196-7032}}</ref><ref name="GruenpeterFAIRPlus20">{{Cite web |last=Gruenpeter, M. |date=23 November 2020 |title=FAIR + Software: Decoding the principles |url=https://www.fairsfair.eu/sites/default/files/FAIR%20%2B%20software.pdf |format=PDF |publisher=FAIRsFAIR “Fostering FAIR Data Practices In Europe” |accessdate=30 April 2024}}</ref><ref name=":0">{{Cite journal |last=Barker |first=Michelle |last2=Chue Hong |first2=Neil P. |last3=Katz |first3=Daniel S. |last4=Lamprecht |first4=Anna-Lena |last5=Martinez-Ortiz |first5=Carlos |last6=Psomopoulos |first6=Fotis |last7=Harrow |first7=Jennifer |last8=Castro |first8=Leyla Jael |last9=Gruenpeter |first9=Morane |last10=Martinez |first10=Paula Andrea |last11=Honeyman |first11=Tom |date=2022-10-14 |title=Introducing the FAIR Principles for research software |url=https://www.nature.com/articles/s41597-022-01710-x |journal=Scientific Data |language=en |volume=9 |issue=1 |pages=622 |doi=10.1038/s41597-022-01710-x |issn=2052-4463 |pmc=PMC9562067 |pmid=36241754}}</ref><ref name=":1">{{Cite journal |last=Patel |first=Bhavesh |last2=Soundarajan |first2=Sanjay |last3=Ménager |first3=Hervé |last4=Hu |first4=Zicheng |date=2023-08-23 |title=Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool |url=https://www.nature.com/articles/s41597-023-02463-x |journal=Scientific Data |language=en |volume=10 |issue=1 |pages=557 |doi=10.1038/s41597-023-02463-x |issn=2052-4463 |pmc=PMC10447492 |pmid=37612312}}</ref><ref name=":2">{{Cite journal |last=Du |first=Xinsong |last2=Dastmalchi |first2=Farhad |last3=Ye |first3=Hao |last4=Garrett |first4=Timothy J. |last5=Diller |first5=Matthew A. |last6=Liu |first6=Mei |last7=Hogan |first7=William R. |last8=Brochhausen |first8=Mathias |last9=Lemas |first9=Dominick J. |date=2023-02-06 |title=Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software |url=https://link.springer.com/10.1007/s11306-023-01974-3 |journal=Metabolomics |language=en |volume=19 |issue=2 |pages=11 |doi=10.1007/s11306-023-01974-3 |issn=1573-3890}}</ref>


But why are research laboratories increasingly pushing for more findable, accessible, interoperable, and reusable research objects and software? The short answer, as evidenced by the Wilkinson ''et al.'' quote above is that greater innovation can be gained through improved knowledge discovery. The discovery process necessary for that greater innovation—whether through traditional research methods or [[artificial intelligence]] (AI)-driven methods—is enhanced when research objects and software are compatible with the core ideas of FAIR.<ref name="WilkinsonTheFAIR16" /><ref name="OlsenEmbracing23">{{cite web |url=https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness |title=Embracing FAIR Data on the Path to AI-Readiness |author=Olsen, C. |work=Pharma's Almanac |date=01 September 2023 |accessdate=03 May 2024}}</ref><ref name="HuertaFAIRForAI23">{{Cite journal |last=Huerta |first=E. A. |last2=Blaiszik |first2=Ben |last3=Brinson |first3=L. Catherine |last4=Bouchard |first4=Kristofer E. |last5=Diaz |first5=Daniel |last6=Doglioni |first6=Caterina |last7=Duarte |first7=Javier M. |last8=Emani |first8=Murali |last9=Foster |first9=Ian |last10=Fox |first10=Geoffrey |last11=Harris |first11=Philip |date=2023-07-26 |title=FAIR for AI: An interdisciplinary and international community building perspective |url=https://www.nature.com/articles/s41597-023-02298-6 |journal=Scientific Data |language=en |volume=10 |issue=1 |pages=487 |doi=10.1038/s41597-023-02298-6 |issn=2052-4463 |pmc=PMC10372139 |pmid=37495591}}</ref>


===1.1 History and evolution===
A slightly longer answer, suitable for a Q&A topic, requires looking at a few more details of the FAIR principles as applied to both research objects and research software. Research laboratories, whether located in an organization or contracted out as third parties, exist to innovate. That innovation can come in the form of discovering new materials that may or may not have a future application, developing a pharmaceutical to improve patient outcomes for a particular disease, or modifying (for some sort of improvement) an existing food or beverage recipe, among others. In academic research labs, this usually looks like knowledge advancement and the publishing of research results, whereas in industry research labs, this typically looks like more practical applications of research concepts to new or existing products or services. In both cases, research software was likely involved at some point, whether it be something like a researcher-developed [[bioinformatics]] application or a commercial vendor-developed [[electronic laboratory notebook]] (ELN).  
Cloud computing has its strongest origins in the "web services" phase of internet development. In November 2000, Mind Electric CEO and [[distributed computing]] visionary Graham Glass, writing for IBM, described web services as "building blocks for creating open distributed systems" that "allow companies and individuals to quickly and cheaply make their digital assets available worldwide," while prognosticating that web services "will catalyze a shift from client-server to peer-to-peer architectures."<ref name="GlassTheWeb00">{{cite web |url=http://www-106.ibm.com/developerworks/library/ws-peer1.html |archiveurl=https://web.archive.org/web/20010424015036/http://www-106.ibm.com/developerworks/library/ws-peer1.html |title=The Web services (r)evolution, Part 1: Applying Web services to applications |author=Glass, G. |work=IBM developerWorks |publisher=IBM |date=November 2000 |archivedate=24 April 2001 |accessdate=28 July 2023}}</ref> At that point, the likes of Microsoft and IBM were already developing toolkits for creating and deploying web services<ref name="GlassTheWeb00" />, with IBM releasing an initial high-level report in May 2001 on IBM's web services architecture approach. In that paper, web services were described by its author Heather Kreger as allowing "companies to reduce the cost of doing e-business, to deploy solutions faster, and to open up new opportunities," while also allowing "applications to be integrated more rapidly, easily, and less expensively than ever before."<ref name="KregerWeb01">{{cite web |url=https://www.researchgate.net/profile/Heather-Kreger/publication/235720479_Web_Services_Conceptual_Architecture_WSCA_10/links/563a67e008ae337ef2984607/Web-Services-Conceptual-Architecture-WSCA-10.pdf |format=PDF |title=Web Services Conceptual Architecture (WSCA 1.0) |author=Kreger, H. |date=May 2001 |publisher=IBM Software Group |accessdate=28 July 2023}}</ref>


Here's a recap of thinking on web services at the turn of the century:
===FAIR research objects===
Regarding research objects themselves, the FAIR principles essentially say "vast amounts of data and information in largely heterogeneous formats spread across disparate sources both electronic and paper make modern research workflows difficult, tedious, and at times impossible. Further, repeatability, reproducibility, and replicability of openly published or secure internal research results is at risk, giving less confidence to academic peers in the published research, or less confidence to critical stakeholders in the viability of a researched prototype." As such, research objects (which include not only their inherent data and information but also any [[metadata]] that describe features of that data and information) need to be<ref name="Rocca-SerraFAIRCook22">{{Cite book |last=Rocca-Serra, Philippe |last2=Sansone, Susanna-Assunta |last3=Gu, Wei |last4=Welter, Danielle |last5=Abbassi Daloii, Tooba |last6=Portell-Silva, Laura |date=2022-06-30 |title=D2.1 FAIR Cookbook |url=https://zenodo.org/record/6783564 |chapter=Introducing the FAIR Principles |doi=10.5281/ZENODO.6783564}}</ref>:


* "[act as] building blocks for creating open distributed systems"<ref name="GlassTheWeb00" />
*''findable'', with globally unique and persistent identifiers, rich metadata that link to the identifier of the data described, and an ability to be indexed as an effectively searchable resource;
* "quickly and cheaply make ... digital assets available worldwide"<ref name="GlassTheWeb00" />
*''accessible'', being able to be retrieved (including metadata of data that is no longer available) by identifiers using secure standardized communication protocols that are open, free, and universally implementable with authentication and authorization mechanisms;
* "catalyze a shift from client-server to peer-to-peer architectures"<ref name="GlassTheWeb00" />
*''interoperable'', represented using formal, accessible, shared, and relevant language models and vocabularies that abide by FAIR principles, as well as with qualified linkage to other metadata; and
* "reduce the cost of doing e-business, to deploy solutions faster, and to open up new opportunities"<ref name="KregerWeb01" />
*''reusable'', being richly described by accurate and relevant metadata, released with a clear and accessible data usage license, associated with sufficiently detailed provenance information, and compliant with discipline-specific community standards.
* "[allow] applications to be integrated more rapidly, easily, and less expensively than ever before"<ref name="KregerWeb01" />


We'll come back to that. For the next stop, however, we have to consider the case of Amazon and how they viewed web services at that time. Leading up to the twenty-first century, Amazon was beginning to expand beyond its book selling roots, opening up its marketplace to other third parties (affiliates) to sell their own goods on Amazon's platform. That effort required an expansion of IT infrastructure to support web-scale third-party selling, but as it turned out, a lot of that IT infrastructure, while reliable and cost-effective, had been previously added piecemeal, with many components getting "tangled" along the way. Amazon project leads and external partners were clamoring for better infrastructure services. This required untangling the IT and associated provider data into an internally scalable, centralized infrastructure that allowed for smoother communication and [[Information management|data management]] using well-documented APIs.<ref name="FurrierExclusive15">{{cite web |url=https://medium.com/@furrier/original-content-the-story-of-aws-and-andy-jassys-trillion-dollar-baby-4e8a35fd7ed |title=Exclusive: The Story of AWS and Andy Jassy’s Trillion Dollar Baby |author=Furrier, J. |work=Medium.com |date=29 January 2015 |accessdate=28 July 2023}}</ref><ref name="MillerHowAWS16">{{cite web |url=https://techcrunch.com/2016/07/02/andy-jassys-brief-history-of-the-genesis-of-aws/ |title=How AWS came to be |author=Miller, R. |work=TechCrunch |date=02 July 2016 |accessdate=28 July 2023}}</ref> By 2003, the company was indirectly acting as a services industry to its partners. "Why not act upon this strength?" was the sentiment that quickly developed that year, with Amazon choosing to use its internal compute, storage, and database infrastructure and related expertise to its advantage.<ref name="MillerHowAWS16" />
All that talk of unique persistent identifiers, communication protocols, authentication mechanisms, language models (e.g., [[ontology]] languages), standardized vocabularies, provenance information, and more could make one's head spin. And, to be fair, it has been challenging for research groups to adopt FAIR, with few widespread international efforts to translate the FAIR principles to broad research. The FAIR Cookbook represents one example of such international collaborative effort, providing "a combination of guidance, technical, hands-on, background and review types to cover the operation steps of FAIR data management."<ref name="Rocca-SerraFAIRCook22-1">{{Cite book |last=Rocca-Serra, Philippe |last2=Sansone, Susanna-Assunta |last3=Gu, Wei |last4=Welter, Danielle |last5=Abbassi Daloii, Tooba |last6=Portell-Silva, Laura |date=2022-06-30 |title=D2.1 FAIR Cookbook |url=https://zenodo.org/record/6783564 |chapter=Introduction |doi=10.5281/ZENODO.6783564}}</ref> In fact, the Cookbook is illustrative of the challenges of implementing FAIR in research laboratories, particularly given the diverse array of vocabularies used across the wealth of scientific disciplines, such as [[biobanking]], [[biomedical engineering]], [[botany]], [[food science]], and [[materials science]]. The way a botanical research organization makes its research objects FAIR is going to require a set of different tools than the materials science research organization. But all of them will turn to [[Informatics (academic field)|informatics]] tools, data management plans, database tools, and more to not only massage existing research objects to be FAIR but also better ensure newly created research objects are FAIR as well.


At that point, the paradigm of web services expanded to include infrastructure as a service or IaaS, with compute, storage, and database services running over the internet for web developers to utilize.<ref name="FurrierExclusive15" /><ref name="MillerHowAWS16" /> "If you believe developers will build applications from scratch using web services as primitive building blocks, then the operating system becomes the internet,” noted AWS CEO Andy Jassy in a 2015 retrospective interview.<ref name="FurrierExclusive15" /> From that concept evolved the idea of determining what it would take to allow any entity to run their technology applications over their web-service-based IaaS platform. In August 2006, Amazon introduced its Amazon Elastic Compute Cloud (Amazon EC2), "a web service that provides resizable compute capacity in the cloud."<ref name="AWSAnnounc06">{{cite web |url=https://aws.amazon.com/about-aws/whats-new/2006/08/24/announcing-amazon-elastic-compute-cloud-amazon-ec2---beta/ |title=Announcing Amazon Elastic Compute Cloud (Amazon EC2) - beta |publisher=Amazon Web Services |date=24 August 2006 |accessdate=28 July 2023}}</ref><ref name="ButlerAmazon06">{{cite journal |title=Amazon puts network power online |journal=Nature |author=Butler, D. |volume=444 |issue=528 |year=2006 |doi=10.1038/444528a}}</ref> This quickly prompted others in academic and scientific fields to continue the conversation of turning IT and its infrastructure into a service.<ref name="ButlerAmazon06" /><ref name="KeITeS06">{{cite journal |title=ITeS - Transcending the Traditional Service Model |journal=Proceedings of the 2006 IEEE International Conference on e-Business Engineering |author=Ke, J.-s. |page=2 |year=2006 |doi=10.1109/ICEBE.2006.66}}</ref> In turn, conversations changed, discussing the opportunities inherent to "cloud computing," including Google and IBM partnering to virtualize computers on new data centers for boosting academic research and teaching new computer science students<ref name="LohrGoogle07">{{cite web |url=http://www.nytimes.com/2007/10/08/technology/08cloud.html?_r=1&or |archiveurl=http://www.csun.edu/pubrels/clips/Oct07/10-08-07E.pdf |format=PDF |title=Google and I.B.M. Join in 'Cloud Computing' Research |author=Lohr, S. |work=The New York Times |date=08 October 2007 |archivedate=08 October 2007 |accessdate=28 July 2023}}</ref><ref name="HandHead07">{{cite journal |title=Head in the clouds |journal=Nature |author=Hand, E. |volume=449 |issue=963 |year=2007 |doi=10.1038/449963a}}</ref>, IBM releasing a white paper on cloud computing<ref name="BossCloud07">{{cite web |url=http://download.boulder.ibm.com/ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf |archiveurl=https://web.archive.org/web/20090206015244/http://download.boulder.ibm.com/ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf |format=PDF |title=Cloud Computing |author=Boss, G.; Malladi, P.; Quan, D. et al. |publisher=IBM Corporation |date=08 October 2007 |archivedate=06 February 2009 |accessdate=28 July 2023}}</ref> and announcing its Blue Cloud initiative<ref name="LohrIBM07">{{cite web |url=https://www.nytimes.com/2007/11/15/technology/15blue.html |title=I.B.M. to Push 'Cloud Computing,' Using Data From Afar |author=Lohr, S. |work=The New York Times |date=15 November 2007 |accessdate=28 July 2023}}</ref>, and Google doubling down on its cloud-based software offerings in competition with Microsoft.<ref name="LohrGoogleGets07">{{cite web |url=https://www.nytimes.com/2007/12/16/technology/16goog.html |archiveurl=https://signallake.com/innovation/GoogleMicrosoft121607.pdf |format=PDF |title=Google Gets Ready to Rumble With Microsoft |author=Lohr, S.; Helft, M. |work=The New York Times |date=16 December 2007 |archivedate=16 December 2007 |accessdate=28 July 2023}}</ref>  
===FAIR research software===
Discussion on research software and its FAIRness is more complicated. It is beyond the scope of this article to go into greater detail about the concepts surrounding FAIR research software, but a brief overview will be attempted. When the FAIR principles were first published, the framework was largely being applied to research objects. However, researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts. This led to recognizing that digital research objects go beyond data and information, and that there is a "specific nature of software" used in research; that research software should not be considered "just data."<ref name="GruenpeterFAIRPlus20" /> The end result has been seen researchers begin to apply the core concepts of FAIR to research software, but slightly differently from research objects.<ref name="NIHPubMedSearch" /><ref name="HasselbringFromFAIR20" /><ref name="GruenpeterFAIRPlus20" /><ref name=":0" /><ref name=":1" /><ref name=":2" />


In IBM's 2007 white paper, they described cloud computing as a "pool of virtualized computer resources" that can<ref name="BossCloud07" />:
Unsurprisingly, what researchers consider to be "research software" for purposes of FAIR has historically been interpreted numerous ways. Does the commercial spreadsheet software used to make calculations to research data deserve to be called research software in parallel with the lab-developed bioinformatics application used to generate that data? Given the difficulties of gaining a consensus definition of the term, a 2021 international initiative called FAIRsFAIR made a good-faith effort to define "research software" with the feedback of multiple stakeholders. The short version of their resulting definition is that, "[r]esearch software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose."<ref name="GruenpeterDefining21">{{Cite journal |last=Gruenpeter, Morane |last2=Katz, Daniel S. |last3=Lamprecht, Anna-Lena |last4=Honeyman, Tom |last5=Garijo, Daniel |last6=Struck, Alexander |last7=Niehues, Anna |last8=Martinez, Paula Andrea |last9=Castro, Leyla Jael |last10=Rabemanantsoa, Tovo |last11=Chue Hong, Neil P. |date=2021-09-13 |title=Defining Research Software: a controversial discussion |url=https://zenodo.org/record/5504016 |journal=Zenodo |doi=10.5281/zenodo.5504016}}</ref> Of note is the last part, acknowledging that research software can be developed in the lab during the research process or developed beforehand by, for example, a commercial software developer with a strong purpose of being used for research. As such, Microsoft Excel may not be looked upon as research software, but an ELN or [[laboratory information management system]] (LIMS) thoughtfully developed with research activities in mind could be considered research software. More often than not, that software is going to be developed in-house. A growing push for the FAIRification of that software, as well as commercial research solutions, has seen the emergence of "research software engineering" as a domain of practice.<ref name="MoynihanTheHitch20">{{cite web |url=https://invenia.github.io/blog/2020/07/07/software-engineering/ |title=The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE |author=Moynihan, G. |work=Invenia Blog |publisher=Invenia Technical Computing Corporation |date=07 July 2020}}</ref><ref name="WoolstonWhySci22">{{Cite journal |last=Woolston |first=Chris |date=2022-05-31 |title=Why science needs more research software engineers |url=https://www.nature.com/articles/d41586-022-01516-2 |journal=Nature |language=en |pages=d41586–022–01516-2 |doi=10.1038/d41586-022-01516-2 |issn=0028-0836}}</ref> While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research"<ref name="WoolstonWhySci22" /><ref name="CohenTheFour21">{{Cite journal |last=Cohen |first=Jeremy |last2=Katz |first2=Daniel S. |last3=Barker |first3=Michelle |last4=Chue Hong |first4=Neil |last5=Haines |first5=Robert |last6=Jay |first6=Caroline |date=2021-01 |title=The Four Pillars of Research Software Engineering |url=https://ieeexplore.ieee.org/document/8994167/ |journal=IEEE Software |volume=38 |issue=1 |pages=97–105 |doi=10.1109/MS.2020.2973362 |issn=0740-7459}}</ref>, with research software engineering efforts focusing on that concept as being vital to future research outcomes. Cohen ''et al.'' add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."<ref name="CohenTheFour21" />


*  "host a variety of different workloads, including batch-style back-end jobs and interactive, user-facing applications";
*  "allow workloads to be deployed and scaled-out quickly through the rapid provisioning of virtual machines or physical machines";
*  "support redundant, self-recovering, highly scalable programming models that allow workloads to recover from many unavoidable hardware/software failures";
*  "monitor resource use in real time to enable rebalancing of allocations when needed"; and
*  "be a cost efficient model for delivering information services, reducing IT management complexity, promoting innovation, and increasing responsiveness through real-time workload balancing."


In 2011, the National Institute of Standards and Technology (NIST) came up with a more standards-based definition to cloud computing. They described it as "a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction."<ref name="MellTheNIST11">{{cite web |url=https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf |format=PDF |title=The NIST Definition of Cloud Computing |author=Mell, P.; Grance, T. |publisher=NIST |date=September 2011 |accessdate=28 July 2023}}</ref> They went on to highlight the five essential characteristics further<ref name="MellTheNIST11" />:


* On-demand self-service: The unilateral provision of computing resources should be an automatic or nearly automatic process.
===FAIRer research objects, better software, greater innovation===
* Broad network access: Thin- or thick-client platforms, both hardwired and mobile, should allow for standardized, networkable access to those computing resources.
* Resource pooling: A multi-tenant model requires the provisioning of resources to serve a wide customer base, with a layer of abstraction that gives the user a sense of location independence from those resources.
* Rapid elasticity: The platform's resources should be readily and/or automatically scalable commensurate with demand, such that the user sees no negative impact in their activities.
* Measured service: The resources should be automatically controlled and optimized by a measured service or metering system, transparently providing accurate and timely information about resource usage.


When we compare these 2007 and 2011 definitions of cloud computing with the comments on web services by Glass and Kreger at the turn of the century (as well as our own derived definition prior), we can't help but see how the early vision for cloud computing has taken shape today. First, web services can indeed be paired with other technologies to form a distributed system, in this case a centralized and scalable computing infrastructure that can be used by practically anyone to run software, develop applications, and "host a variety of different workloads."<ref name="BossCloud07" /> Second, those workloads can be quickly deployed worldwide, wherever there is internet access, and typically at a fair price, when compared to the costs of on-premises data management.<ref name="ViolinoWhere20">{{cite web |url=https://www.infoworld.com/article/3532288/where-to-look-for-cost-savings-in-the-cloud.html |title=Where to look for cost savings in the cloud |author=Violino, B. |work=InfoWorld |date=16 March 2020 |accessdate=28 July 2023}}</ref> Third, new opportunities are indeed developing for organizations seeking to tap into the on-demand, rapid, scalable, and cost-efficient nature of cloud computing.<ref name="OjalaDiscover16">{{cite journal |title=Discovering and creating business opportunities for cloud services |journal=Journal of Systems and Software |author=Ojala, A. |volume=113 |pages=408–17 |year=2016 |doi=10.1016/j.jss.2015.11.004}}</ref><ref name="PetteyCloud20">{{cite web |url=https://www.gartner.com/smarterwithgartner/cloud-shift-impacts-all-it-markets/ |title=Cloud Shift Impacts All IT Markets |author=Pettey, C. |work=Smarter with Gartner |date=26 October 2020 |accessdate=28 July 2023}}</ref> And finally, benefits are being seen in the integration of applications via the cloud, particularly as more options for multicloud and hybrid cloud integration develop.<ref name="PetteyFiveApp19">{{cite web |url=https://www.gartner.com/smarterwithgartner/5-approaches-cloud-applications-integration/ |title=5 Approaches to Cloud Applications Integration |author=Pettey, C. |work=Smarter with Gartner |date=14 May 2019 |accessdate=28 July 2023}}</ref> The early vision that perhaps hasn't been realized is found in Glass' "shift from client-server to peer-to-peer architectures," though discussions about the promise of peer-to-peer cloud computing have occurred since.<ref name="BabaogluEscape14">{{cite journal |title=The People's Cloud |journal=IEEE Spectrum |author=Babaoglu, O.; Marzolla, M. |volume=51 |issue=10 |pages=50–55 |year=2014 |doi=10.1109/MSPEC.2014.6905491 |url=https://spectrum.ieee.org/escape-from-the-data-center-the-promise-of-peertopeer-cloud-computing}}</ref>
==References==
{{Reflist|colwidth=30em}}


Though clearly linked to web services and the early vision of cloud computing in the 2000s, the cloud computing of the 2020s is a remarkably more advanced and continually evolving technology. However, it's still not without its challenges today. The data security, privacy, and governance of computing in general, and cloud computing in particular, will continue to require more rigorous approaches, as will reducing remaining data silos in organizations with pivots to hybrid cloud, multicloud, and serverless cloud implementations.<ref name="Goodison10Fut20">{{cite web |url=https://www.crn.com/news/cloud/10-future-cloud-computing-trends-to-watch-in-2021 |title=10 Future Cloud Computing Trends To Watch In 2021 |author=Goodison, D. |work=CRN |date=20 November 2020 |accessdate=28 July 2023}}</ref><ref name="DTCCCloud20">{{cite web |url=http://www.dtcc.com/-/media/Files/Downloads/WhitePapers/DTCC-Cloud-Journey-WP |format=PDF |title=Cloud Technology: Powerful and Evolving |author=DTCC |date=November 2020 |accessdate=28 July 2023}}</ref> But what is "hybrid cloud"? "Serverless cloud?" The next section goes into further detail.
<!---Place all category tags here-->

Revision as of 00:13, 8 May 2024

Sandbox begins below

[[File:|right|520px]] Title: Why are the FAIR data principles increasingly important to research laboratories and their software?

Author for citation: Shawn E. Douglas

License for content: Creative Commons Attribution-ShareAlike 4.0 International

Publication date: May 2024

Introduction

The growing importance of the FAIR principles to research laboratories

The FAIR data principles were published by Wilkinson et al. in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and information of all shapes and formats) become more universally findable, accessible, interoperable, and reusable (FAIR) by both machines and people.[1] The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."[1] Since being published, other researchers have taken the somewhat broad set of principles and refined them to their own scientific disciplines, as well as to other types of research objects, including the research software being used by those researchers to generate research objects.[2][3][4][5][6][7]

But why are research laboratories increasingly pushing for more findable, accessible, interoperable, and reusable research objects and software? The short answer, as evidenced by the Wilkinson et al. quote above is that greater innovation can be gained through improved knowledge discovery. The discovery process necessary for that greater innovation—whether through traditional research methods or artificial intelligence (AI)-driven methods—is enhanced when research objects and software are compatible with the core ideas of FAIR.[1][8][9]

A slightly longer answer, suitable for a Q&A topic, requires looking at a few more details of the FAIR principles as applied to both research objects and research software. Research laboratories, whether located in an organization or contracted out as third parties, exist to innovate. That innovation can come in the form of discovering new materials that may or may not have a future application, developing a pharmaceutical to improve patient outcomes for a particular disease, or modifying (for some sort of improvement) an existing food or beverage recipe, among others. In academic research labs, this usually looks like knowledge advancement and the publishing of research results, whereas in industry research labs, this typically looks like more practical applications of research concepts to new or existing products or services. In both cases, research software was likely involved at some point, whether it be something like a researcher-developed bioinformatics application or a commercial vendor-developed electronic laboratory notebook (ELN).

FAIR research objects

Regarding research objects themselves, the FAIR principles essentially say "vast amounts of data and information in largely heterogeneous formats spread across disparate sources both electronic and paper make modern research workflows difficult, tedious, and at times impossible. Further, repeatability, reproducibility, and replicability of openly published or secure internal research results is at risk, giving less confidence to academic peers in the published research, or less confidence to critical stakeholders in the viability of a researched prototype." As such, research objects (which include not only their inherent data and information but also any metadata that describe features of that data and information) need to be[10]:

  • findable, with globally unique and persistent identifiers, rich metadata that link to the identifier of the data described, and an ability to be indexed as an effectively searchable resource;
  • accessible, being able to be retrieved (including metadata of data that is no longer available) by identifiers using secure standardized communication protocols that are open, free, and universally implementable with authentication and authorization mechanisms;
  • interoperable, represented using formal, accessible, shared, and relevant language models and vocabularies that abide by FAIR principles, as well as with qualified linkage to other metadata; and
  • reusable, being richly described by accurate and relevant metadata, released with a clear and accessible data usage license, associated with sufficiently detailed provenance information, and compliant with discipline-specific community standards.

All that talk of unique persistent identifiers, communication protocols, authentication mechanisms, language models (e.g., ontology languages), standardized vocabularies, provenance information, and more could make one's head spin. And, to be fair, it has been challenging for research groups to adopt FAIR, with few widespread international efforts to translate the FAIR principles to broad research. The FAIR Cookbook represents one example of such international collaborative effort, providing "a combination of guidance, technical, hands-on, background and review types to cover the operation steps of FAIR data management."[11] In fact, the Cookbook is illustrative of the challenges of implementing FAIR in research laboratories, particularly given the diverse array of vocabularies used across the wealth of scientific disciplines, such as biobanking, biomedical engineering, botany, food science, and materials science. The way a botanical research organization makes its research objects FAIR is going to require a set of different tools than the materials science research organization. But all of them will turn to informatics tools, data management plans, database tools, and more to not only massage existing research objects to be FAIR but also better ensure newly created research objects are FAIR as well.

FAIR research software

Discussion on research software and its FAIRness is more complicated. It is beyond the scope of this article to go into greater detail about the concepts surrounding FAIR research software, but a brief overview will be attempted. When the FAIR principles were first published, the framework was largely being applied to research objects. However, researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts. This led to recognizing that digital research objects go beyond data and information, and that there is a "specific nature of software" used in research; that research software should not be considered "just data."[4] The end result has been seen researchers begin to apply the core concepts of FAIR to research software, but slightly differently from research objects.[2][3][4][5][6][7]

Unsurprisingly, what researchers consider to be "research software" for purposes of FAIR has historically been interpreted numerous ways. Does the commercial spreadsheet software used to make calculations to research data deserve to be called research software in parallel with the lab-developed bioinformatics application used to generate that data? Given the difficulties of gaining a consensus definition of the term, a 2021 international initiative called FAIRsFAIR made a good-faith effort to define "research software" with the feedback of multiple stakeholders. The short version of their resulting definition is that, "[r]esearch software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose."[12] Of note is the last part, acknowledging that research software can be developed in the lab during the research process or developed beforehand by, for example, a commercial software developer with a strong purpose of being used for research. As such, Microsoft Excel may not be looked upon as research software, but an ELN or laboratory information management system (LIMS) thoughtfully developed with research activities in mind could be considered research software. More often than not, that software is going to be developed in-house. A growing push for the FAIRification of that software, as well as commercial research solutions, has seen the emergence of "research software engineering" as a domain of practice.[13][14] While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research"[14][15], with research software engineering efforts focusing on that concept as being vital to future research outcomes. Cohen et al. add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."[15]


FAIRer research objects, better software, greater innovation

References

  1. 1.0 1.1 1.2 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 
  2. 2.0 2.1 "fair data principles". PubMed Search. National Institutes of Health, National Library of Medicine. https://pubmed.ncbi.nlm.nih.gov/?term=fair+data+principles. Retrieved 30 April 2024. 
  3. 3.0 3.1 Hasselbring, Wilhelm; Carr, Leslie; Hettrick, Simon; Packer, Heather; Tiropanis, Thanassis (25 February 2020). "From FAIR research data toward FAIR and open research software" (in en). it - Information Technology 62 (1): 39–47. doi:10.1515/itit-2019-0040. ISSN 2196-7032. https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html. 
  4. 4.0 4.1 4.2 Gruenpeter, M. (23 November 2020). "FAIR + Software: Decoding the principles" (PDF). FAIRsFAIR “Fostering FAIR Data Practices In Europe”. https://www.fairsfair.eu/sites/default/files/FAIR%20%2B%20software.pdf. Retrieved 30 April 2024. 
  5. 5.0 5.1 Barker, Michelle; Chue Hong, Neil P.; Katz, Daniel S.; Lamprecht, Anna-Lena; Martinez-Ortiz, Carlos; Psomopoulos, Fotis; Harrow, Jennifer; Castro, Leyla Jael et al. (14 October 2022). "Introducing the FAIR Principles for research software" (in en). Scientific Data 9 (1): 622. doi:10.1038/s41597-022-01710-x. ISSN 2052-4463. PMC PMC9562067. PMID 36241754. https://www.nature.com/articles/s41597-022-01710-x. 
  6. 6.0 6.1 Patel, Bhavesh; Soundarajan, Sanjay; Ménager, Hervé; Hu, Zicheng (23 August 2023). "Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool" (in en). Scientific Data 10 (1): 557. doi:10.1038/s41597-023-02463-x. ISSN 2052-4463. PMC PMC10447492. PMID 37612312. https://www.nature.com/articles/s41597-023-02463-x. 
  7. 7.0 7.1 Du, Xinsong; Dastmalchi, Farhad; Ye, Hao; Garrett, Timothy J.; Diller, Matthew A.; Liu, Mei; Hogan, William R.; Brochhausen, Mathias et al. (6 February 2023). "Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software" (in en). Metabolomics 19 (2): 11. doi:10.1007/s11306-023-01974-3. ISSN 1573-3890. https://link.springer.com/10.1007/s11306-023-01974-3. 
  8. Olsen, C. (1 September 2023). "Embracing FAIR Data on the Path to AI-Readiness". Pharma's Almanac. https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness. Retrieved 03 May 2024. 
  9. Huerta, E. A.; Blaiszik, Ben; Brinson, L. Catherine; Bouchard, Kristofer E.; Diaz, Daniel; Doglioni, Caterina; Duarte, Javier M.; Emani, Murali et al. (26 July 2023). "FAIR for AI: An interdisciplinary and international community building perspective" (in en). Scientific Data 10 (1): 487. doi:10.1038/s41597-023-02298-6. ISSN 2052-4463. PMC PMC10372139. PMID 37495591. https://www.nature.com/articles/s41597-023-02298-6. 
  10. Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Gu, Wei; Welter, Danielle; Abbassi Daloii, Tooba; Portell-Silva, Laura (30 June 2022). "Introducing the FAIR Principles". D2.1 FAIR Cookbook. doi:10.5281/ZENODO.6783564. https://zenodo.org/record/6783564. 
  11. Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Gu, Wei; Welter, Danielle; Abbassi Daloii, Tooba; Portell-Silva, Laura (30 June 2022). "Introduction". D2.1 FAIR Cookbook. doi:10.5281/ZENODO.6783564. https://zenodo.org/record/6783564. 
  12. Gruenpeter, Morane; Katz, Daniel S.; Lamprecht, Anna-Lena; Honeyman, Tom; Garijo, Daniel; Struck, Alexander; Niehues, Anna; Martinez, Paula Andrea et al. (13 September 2021). "Defining Research Software: a controversial discussion". Zenodo. doi:10.5281/zenodo.5504016. https://zenodo.org/record/5504016. 
  13. Moynihan, G. (7 July 2020). "The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE". Invenia Blog. Invenia Technical Computing Corporation. https://invenia.github.io/blog/2020/07/07/software-engineering/. 
  14. 14.0 14.1 Woolston, Chris (31 May 2022). "Why science needs more research software engineers" (in en). Nature: d41586–022–01516-2. doi:10.1038/d41586-022-01516-2. ISSN 0028-0836. https://www.nature.com/articles/d41586-022-01516-2. 
  15. 15.0 15.1 Cohen, Jeremy; Katz, Daniel S.; Barker, Michelle; Chue Hong, Neil; Haines, Robert; Jay, Caroline (1 January 2021). "The Four Pillars of Research Software Engineering". IEEE Software 38 (1): 97–105. doi:10.1109/MS.2020.2973362. ISSN 0740-7459. https://ieeexplore.ieee.org/document/8994167/.