Difference between revisions of "Scientific data management system"
Shawndouglas (talk | contribs) m (→References: Reflist) |
Shawndouglas (talk | contribs) (Updated for 2024) |
||
Line 1: | Line 1: | ||
[[File:NIST Testing standard interfaces.jpg|right|thumb|NIST tests standard interfaces for its lab | [[File:NIST Testing standard interfaces.jpg|right|thumb|360px|NIST tests standard interfaces for its lab instruments. SDMSs allow labs to integrate raw and processed instrument data with other types of data, unstructured and structured.]]A '''scientific data management system''' ('''SDMS''') (occasionally referenced to as a '''laboratory data management system''' ['''LDMS''']<ref name="KranjcIntro21">{{Citation |last=Kranjc |first=Tilen |date=2021-08-16 |editor-last=Zupancic |editor-first=Klemen |editor2-last=Pavlek |editor2-first=Tea |editor3-last=Erjavec |editor3-first=Jana |title=Introduction to Laboratory Software Solutions and Differences Between Them |url=https://onlinelibrary.wiley.com/doi/10.1002/9783527825042.ch3 |work=Digital Transformation of the Laboratory |language=en |edition=1 |publisher=Wiley |pages=75–84 |doi=10.1002/9783527825042.ch3 |isbn=978-3-527-34719-3}}</ref><ref name="AvunjianLab23">{{cite web |url=https://www.ligolab.com/post/laboratory-software-systems-what-you-need-to-know-to-make-an-informed-decision |title=Laboratory Software Systems: What You Need to Know to Make an Informed Decision |author=Avunjian, S. |work=LigoLab Blog |publisher=LigoLab Information Systems |date=17 November 2023 |accessdate=22 March 2024}}</ref>) is software that acts similarly to a document management system (DMS), capturing, cataloging, and archiving data generated by [[laboratory]] instruments (e.g., [[high-performance liquid chromatography]] and [[mass spectrometry]] instruments) and applications (e.g., [[laboratory information management system]]s, [[electronic laboratory notebook]]s, and other analytical applications) in a compliant, often pre-defined manner best suitable for its intended use, whether it be structured, unstructured, or semi-structured data.<ref name="HaywardExperts17">{{cite web |url=https://www.laboratoryequipment.com/article/2017/05/experts-explain-rise-laboratory-data-lakes |archiveurl=https://web.archive.org/web/20170516235859/http://www.laboratoryequipment.com/article/2017/05/experts-explain-rise-laboratory-data-lakes |title=Experts Explain: The Rise of Laboratory Data Lakes |author=Hayward, S. |work=Laboratory Equipment |publisher=Advantage Business Media |date=15 May 2017 |archivedate=16 May 2017 |accessdate=22 March 2024}}</ref><ref name="ASTME1578">{{cite web |url=https://www.astm.org/e1578-18.html |title=ASTM E1578-18 Standard Guide for Laboratory Informatics |publisher=ASTM International |date=23 August 2019 |accessdate=22 March 2024}}</ref> The SDMS can also act as a gatekeeper, serving platform-independent data to informatics applications and other stakeholders. | ||
As | ==Purpose and technology== | ||
An SDMS is used to improve data handling and management issues in a number of scientific disciplines. As the four Vs of modern big data—volume, variety, veracity, and velocity—increase time spent on data acquisition and management, taking time away from other aspects of scientific research and complicating aspects of experimental reproducibility, solutions like an SDMS can help better manage the total lifecycle of data.<ref name=StansberryDataFed19">{{Cite journal |last=Stansberry |first=Dale |last2=Somnath |first2=Suhas |last3=Breet |first3=Jessica |last4=Shutt |first4=Gregory |last5=Shankar |first5=Mallikarjun |date=2019-12 |title=DataFed: Towards Reproducible Research via Federated Data Management |url=https://ieeexplore.ieee.org/document/9071425/ |journal=2019 International Conference on Computational Science and Computational Intelligence (CSCI) |publisher=IEEE |place=Las Vegas, NV, USA |pages=1312–1317 |doi=10.1109/CSCI49370.2019.00245 |isbn=978-1-7281-5584-5}}</ref> This is accomplished through a variety of tools, including data normalization and integration, data sharing and management, metadata capture and management, data object and record management, and robust search tools.<ref name=StansberryDataFed19" /> | |||
As with many other [[laboratory informatics]] tools, the lines between an SDMS, LIMS, ELN, and other systems are at times blurred, as functionality from these systems makes their way into each other.<ref name="KranjcIntro21" /><ref name="ASTME1578" /> However, there are some essential qualities that an SDMS owns that distinguishes it from other informatics systems: | |||
1. While a LIMS has traditionally been built to handle structured, mostly homogeneous data, an SDMS (and systems like it) is built to handle unstructured, mostly heterogeneous data<ref name="ElliottConsider03">{{cite web |url=https://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data |archiveurl=https://web.archive.org/web/20170426150419/http://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data |title=Considerations for Management of Laboratory Data |author=Elliott, M.H. |work=Scientific Computing |publisher=Advantage Business Media |date=31 October 2003 |archivedate=26 April 2017 |accessdate=22 March 2024}}</ref>, though many can handle structured, unstructured, and semi-structured data. | |||
2. An SDMS typically acts as a seamless "wrapper" for other data systems like LIMS and ELN in the laboratory, though sometimes the SDMS software is readily apparent. | |||
An SDMS | 3. An SDMS is designed primarily for data consolidation and reuse, knowledge integration and management, and knowledge asset discovery and realization.<ref name="ASTME1578" /><ref name="WoodComp07">{{cite web |url=https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf |archiveurl=https://web.archive.org/web/20170825181932/https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf |format=PDF |title=Comprehensive Laboratory Informatics: A Multilayer Approach |author=Wood, S. |work=American Laboratory |page=1 |date=September 2007 |archivedate=22 March 2024}}</ref> | ||
An SDMS can be seen as one potential solution for handling unstructured data, which can make up nearly 75 percent of a research and development unit's data.<ref name="SciComp1">{{cite web |url=http://www.rdworldonline.com/tomorrows-successful-research-organizations-face-a-critical-challenge/ |author=Deutsch, S. |title=Tomorrow’s Successful Research Organizations Face a Critical Challenge |work=R&D World |publisher=WTWH Media LLC |date=31 December 2006 |accessdate=22 March 2024}}</ref> This includes PDF files, images, instrument data, spreadsheets, and other forms of data rendered in many environments in the laboratory. Traditional SDMSs have focused on acting as a nearly invisible blanket or wrapper that integrate [[information]] from corporate offices (standard operating procedures, safety documents, etc.) with data from lab devices and other data management tools, all to be indexed and searchable from a central database. An SDMS also must be focused on increasing research productivity without sacrificing data sharing and collaboration efforts.<ref name="SciComp1" /> | |||
Some of the things a standard SDMS may be asked to do include, but are not limited to<ref name="ASTME1578" /><ref name=StansberryDataFed19" /><ref name="SDMArch">{{cite web |url=http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html |archiveurl=http://web.archive.org/web/20120306015034/http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html |author=Valle, Mario |title=Scientific Data Management |publisher=Swiss National Supercomputing Center |archivedate=06 March 2012 |accessdate=22 March 2024}}</ref><ref name="HetwardSelect09">{{cite web |title=Selection of a Scientific Data Management System (SDMS) Based on User Requirements |author=Heyward, J.E. II |publisher=Indiana University-Purdue University Indianapolis |date=05 November 2009 |pages=5 |doi=10.7912/C2/812 |accessdate=22 March 2024}}</ref>: | |||
*store and archive raw data files; | |||
*interact real-time with simple and complex laboratory instruments; | |||
*retrieve worklists from LIMS and convert them to sequence files; | |||
*require review and approval of actions, with electronic signatures; | |||
*capture provenance information of data; | |||
*allow for annotation of records and collections to inform or warn other users of relevant changes or errors; | |||
*analyze and create reports on laboratory instrument functions; | |||
*perform complex calculations and comparisons of two different sample groups; | |||
*monitor environmental conditions and react when base operating parameters are out of range; | |||
*act as an operational database that allows selective importation/exportation of ELN data; | |||
*manage workflows based on data imported into the SDMS; | |||
*validate other computer systems and software in the laboratory; and | |||
*identify and retrieve data and metadata useful for training [[artificial intelligence]] and [[machine learning]] agents. | |||
==Further reading== | |||
*{{Cite journal |last=Stansberry |first=Dale |last2=Somnath |first2=Suhas |last3=Breet |first3=Jessica |last4=Shutt |first4=Gregory |last5=Shankar |first5=Mallikarjun |date=2019-12 |title=DataFed: Towards Reproducible Research via Federated Data Management |url=https://ieeexplore.ieee.org/document/9071425/ |journal=2019 International Conference on Computational Science and Computational Intelligence (CSCI) |publisher=IEEE |place=Las Vegas, NV, USA |pages=1312–1317 |doi=10.1109/CSCI49370.2019.00245 |isbn=978-1-7281-5584-5}} | |||
== References == | ==References== | ||
{{Reflist|colwidth=30em}} | {{Reflist|colwidth=30em}} | ||
Latest revision as of 16:14, 22 March 2024
A scientific data management system (SDMS) (occasionally referenced to as a laboratory data management system [LDMS][1][2]) is software that acts similarly to a document management system (DMS), capturing, cataloging, and archiving data generated by laboratory instruments (e.g., high-performance liquid chromatography and mass spectrometry instruments) and applications (e.g., laboratory information management systems, electronic laboratory notebooks, and other analytical applications) in a compliant, often pre-defined manner best suitable for its intended use, whether it be structured, unstructured, or semi-structured data.[3][4] The SDMS can also act as a gatekeeper, serving platform-independent data to informatics applications and other stakeholders.
Purpose and technology
An SDMS is used to improve data handling and management issues in a number of scientific disciplines. As the four Vs of modern big data—volume, variety, veracity, and velocity—increase time spent on data acquisition and management, taking time away from other aspects of scientific research and complicating aspects of experimental reproducibility, solutions like an SDMS can help better manage the total lifecycle of data.[5] This is accomplished through a variety of tools, including data normalization and integration, data sharing and management, metadata capture and management, data object and record management, and robust search tools.[5]
As with many other laboratory informatics tools, the lines between an SDMS, LIMS, ELN, and other systems are at times blurred, as functionality from these systems makes their way into each other.[1][4] However, there are some essential qualities that an SDMS owns that distinguishes it from other informatics systems:
1. While a LIMS has traditionally been built to handle structured, mostly homogeneous data, an SDMS (and systems like it) is built to handle unstructured, mostly heterogeneous data[6], though many can handle structured, unstructured, and semi-structured data.
2. An SDMS typically acts as a seamless "wrapper" for other data systems like LIMS and ELN in the laboratory, though sometimes the SDMS software is readily apparent.
3. An SDMS is designed primarily for data consolidation and reuse, knowledge integration and management, and knowledge asset discovery and realization.[4][7]
An SDMS can be seen as one potential solution for handling unstructured data, which can make up nearly 75 percent of a research and development unit's data.[8] This includes PDF files, images, instrument data, spreadsheets, and other forms of data rendered in many environments in the laboratory. Traditional SDMSs have focused on acting as a nearly invisible blanket or wrapper that integrate information from corporate offices (standard operating procedures, safety documents, etc.) with data from lab devices and other data management tools, all to be indexed and searchable from a central database. An SDMS also must be focused on increasing research productivity without sacrificing data sharing and collaboration efforts.[8]
Some of the things a standard SDMS may be asked to do include, but are not limited to[4][5][9][10]:
- store and archive raw data files;
- interact real-time with simple and complex laboratory instruments;
- retrieve worklists from LIMS and convert them to sequence files;
- require review and approval of actions, with electronic signatures;
- capture provenance information of data;
- allow for annotation of records and collections to inform or warn other users of relevant changes or errors;
- analyze and create reports on laboratory instrument functions;
- perform complex calculations and comparisons of two different sample groups;
- monitor environmental conditions and react when base operating parameters are out of range;
- act as an operational database that allows selective importation/exportation of ELN data;
- manage workflows based on data imported into the SDMS;
- validate other computer systems and software in the laboratory; and
- identify and retrieve data and metadata useful for training artificial intelligence and machine learning agents.
Further reading
- Stansberry, Dale; Somnath, Suhas; Breet, Jessica; Shutt, Gregory; Shankar, Mallikarjun (1 December 2019). "DataFed: Towards Reproducible Research via Federated Data Management". 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (Las Vegas, NV, USA: IEEE): 1312–1317. doi:10.1109/CSCI49370.2019.00245. ISBN 978-1-7281-5584-5. https://ieeexplore.ieee.org/document/9071425/.
References
- ↑ 1.0 1.1 Kranjc, Tilen (16 August 2021), Zupancic, Klemen; Pavlek, Tea; Erjavec, Jana, eds., "Introduction to Laboratory Software Solutions and Differences Between Them" (in en), Digital Transformation of the Laboratory (Wiley): 75–84, doi:10.1002/9783527825042.ch3, ISBN 978-3-527-34719-3, https://onlinelibrary.wiley.com/doi/10.1002/9783527825042.ch3
- ↑ Avunjian, S. (17 November 2023). "Laboratory Software Systems: What You Need to Know to Make an Informed Decision". LigoLab Blog. LigoLab Information Systems. https://www.ligolab.com/post/laboratory-software-systems-what-you-need-to-know-to-make-an-informed-decision. Retrieved 22 March 2024.
- ↑ Hayward, S. (15 May 2017). "Experts Explain: The Rise of Laboratory Data Lakes". Laboratory Equipment. Advantage Business Media. Archived from the original on 16 May 2017. https://web.archive.org/web/20170516235859/http://www.laboratoryequipment.com/article/2017/05/experts-explain-rise-laboratory-data-lakes. Retrieved 22 March 2024.
- ↑ 4.0 4.1 4.2 4.3 "ASTM E1578-18 Standard Guide for Laboratory Informatics". ASTM International. 23 August 2019. https://www.astm.org/e1578-18.html. Retrieved 22 March 2024.
- ↑ 5.0 5.1 5.2 Stansberry, Dale; Somnath, Suhas; Breet, Jessica; Shutt, Gregory; Shankar, Mallikarjun (1 December 2019). "DataFed: Towards Reproducible Research via Federated Data Management". 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (Las Vegas, NV, USA: IEEE): 1312–1317. doi:10.1109/CSCI49370.2019.00245. ISBN 978-1-7281-5584-5. https://ieeexplore.ieee.org/document/9071425/.
- ↑ Elliott, M.H. (31 October 2003). "Considerations for Management of Laboratory Data". Scientific Computing. Advantage Business Media. Archived from the original on 26 April 2017. https://web.archive.org/web/20170426150419/http://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data. Retrieved 22 March 2024.
- ↑ Wood, S. (September 2007). "Comprehensive Laboratory Informatics: A Multilayer Approach" (PDF). American Laboratory. p. 1. Archived from the original on 22 March 2024. https://web.archive.org/web/20170825181932/https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf.
- ↑ 8.0 8.1 Deutsch, S. (31 December 2006). "Tomorrow’s Successful Research Organizations Face a Critical Challenge". R&D World. WTWH Media LLC. http://www.rdworldonline.com/tomorrows-successful-research-organizations-face-a-critical-challenge/. Retrieved 22 March 2024.
- ↑ Valle, Mario. "Scientific Data Management". Swiss National Supercomputing Center. Archived from the original on 06 March 2012. http://web.archive.org/web/20120306015034/http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html. Retrieved 22 March 2024.
- ↑ Heyward, J.E. II (5 November 2009). "Selection of a Scientific Data Management System (SDMS) Based on User Requirements". Indiana University-Purdue University Indianapolis. pp. 5. doi:10.7912/C2/812.