Journal:A roadmap for LIMS at NIST Material Measurement Laboratory

From LIMSWiki
Revision as of 22:19, 9 May 2022 by Shawndouglas (talk | contribs) (Finished adding rest of content.)
Jump to navigationJump to search
Full article title A roadmap for LIMS at NIST Material Measurement Laboratory
Author(s) Greene, Gretchen; Ragland, Jared; Trautt, Zachary; Lau, June; Plante, Raymond; Taillon, Joshua; Creuziger, Adam; Becker, Chandler; Bennett, Joseph; Blonder, Niksa; Borsuk, Lisa; Campbell, Carelyn; Friss, Adam; Hale, Lucas; Halter, Michael; Hanisch, Robert; Hardin, Gary; Levine, Lyle; Maragh, Samantha; Miller, Sierra; Muzny, Christopher; Newrock, Marcus; Perkins, John; Plant, Anne; Ravel, Bruce; Ross, David; Scott, John H.; Szakal, Chris; Tona, Alessandro; Vallone, Peter
Author affiliation(s) National Institute of Standards and Technology
Year published 2022
Volume and issue NIST Technical Note 2216
Page(s) i–iii, 1–17
DOI 10.6028/NIST.TN.2216
Distribution license Public domain
Website https://www.nist.gov/publications/roadmap-lims-nist-material-measurement-laboratory
Download https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=934610 (PDF)

Foreword

Over the past decade, emerging technology in laboratory and computational science has changed the landscape for research by accelerating the production, processing, and exchange of data. The NIST Material Measurement Laboratory community recognizes that to keep pace with the transformation of measurement science to a digital paradigm, it is essential to implement laboratory information management systems (LIMS). Effective introduction of LIMS early in the research life cycle provides direct support for planning and execution of experiments and accelerating research productivity. From this perspective, LIMS are not passive entities with isolated interaction, but rather key resources supporting collaboration, scientific integrity, and transfer of knowledge over time. They serve as a delivery system for organizational contributions to the broader federated data community, supporting both controlled and open access, determined by the sensitivity of the research.

The overall goal of a successful LIMS is to empower a research community by establishing common tools providing access to laboratory data resources. Modern LIMS should therefore provide several core functions and touchpoints:

  • Workflow management – A research workflow describes steps to be performed to derive results. These patterns serve as a prescription for LIMS to control the progression of data

and associated services or tools. Automation of a workflow simplifies the transfer of information through defined interfaces from a network of systems.

  • Repository of data – Effective storage and retrieval of data (raw and derived)—including associated metadata, data products, calibration, software, logs, etc.—facilitates data discovery, processing, collaboration, and dissemination.
  • Creation of data products and tools – A LIMS should support storage and processing of raw data, leading to products which can be shared and consumed. Examples would include sample data, instrument-generated data, and algorithms generating defined outputs. Inclusion of data models provides context and structure, and machine learning (ML) integration may generate related data which could be combined into more comprehensive data models. Tools may include visualization, evaluation, and analysis packages offering users advanced capabilities for their research projects.
  • Organization of data for search and retrieval – Tools and interfaces give users access to sophisticated searches of data holdings and efficient mechanisms for data transfer in standardized formats. Searching should extend to domain or project-specific semantics, be coupled closely with related data, and go beyond individual research projects to include super-searches (e.g., use-case-driven interoperability between LIMS).
  • Long-lived, stable, and agile structures – LIMS require institutional and architectural sustainability for long baseline research and curatorship. Technology tends to change faster than the practical lifetime of research programs, so paths must exist for maintaining IT infrastructure and introducing faster and more complex processes.
  • Standards and best practices – LIMS benefit from standardization to support collaborations among research communities and make data workflows efficient and affordable. Community buy-in for standards and best practices is an essential part of LIMS, and organizational shared expertise naturally serves as a means for coordination and adaptation of standards.
  • User involvement – In all the core functions listed above, it is critical to involve the subject matter experts from the beginning. LIMS should establish a working team that explicitly includes representatives from the end user community.

Abstract

Instrumentation generates data faster and in greater quantity than ever before, and inter-laboratory research is in historic demand domestically and internationally to stimulate economic innovation. Strategic mission needs of the NIST Material Measurement Laboratory (MML) to support a wide array of research disciplines therefore compel our organization to adopt advanced strategies for research data management. Laboratory information management systems (LIMS) provide a framework for managing data from the outset of the research life cycle, delivering new capabilities for machine learning (ML), data analysis, collaboration, and dissemination. This roadmap describes our current understanding and strategy for adapting our research workflows for LIMS throughout MML by embracing the use of standards and best practices from data science communities. The NIST research data cyber-infrastructure complements these goals for MML by providing a secure environment to host LIMS solutions. Additionally, integration of scientific workflows requires ongoing collaboration to bridge organizational LIMS with external scientific communities. Thus, MML LIMS will evolve over time in synergy with the technology and experimental environments, delivering new science. LIMS will broaden our mission impact through adoption of the FAIR Data Principles.

Keywords: data, laboratory information management systems, experimental data, research data, research workflows

Introduction

Beginning late 2019, MML initiated as part of its strategic plan the development of "next-generation" data and informatics with a focus on LIMS as a key resource to support research data and science. This effort was complemented by initiatives for enhancing data management planning and data systems infrastructure. The first year’s groundwork established common needs for both individual researchers and teams to engage more readily with LIMS, with a goal of building greater capacity for interaction and use of data. A vision for LIMS was written to convey the purpose of these collective efforts:

“Laboratory information management systems will provide MML scientists a practical means for repeatability, traceability, reproducibility, efficiency, and compliance of research, serving as a beacon to both intramural and extramural community stakeholders.”

The MML approach to implementing LIMS started with defining goals for specific division research laboratory projects and established a cross-divisional Community of Interest (COI) group for sharing solutions, services, practices, and challenges. Comprehensive LIMS solutions have been successfully implemented in several NIST laboratories. Several shared resources have successfully demonstrated use of LIMS components including repository platforms, a standard persistent identifier service, a centralized research data storage with networked data transfer nodes, and data transfer services, in addition to expertise in data modeling and semantics. These resources, along with community best practices, contribute to a basic LIMS architecture for research.

A system view and architecture model provide the foundation for planned future outcomes. In this roadmap, we define a set of research-oriented LIMS capabilities which serves to guide implementation along with components to deliver these capabilities. More detailed guidance for use of specific LIMS resources is available internally to NIST and where possible shared on external repository websites. It is also anticipated that LIMS implementation will provide an important contribution to the goals of NIST program areas such as artificial intelligence (AI), biosystems, chemical informatics, additive manufacturing, and the materials science areas which spearheaded early innovation in data systems through the Materials Genome Initiative.

Roadmap objectives

This roadmap provides a framework for manifesting the MML LIMS vision and outlines the key objectives highlighted by the MML LIMS COI project goals. These are grouped into near- and long-term objectives for MML LIMS prioritization. These objectives, along with broader community efforts, will strengthen the NIST data-as-an-asset[1] strategic approach to research. More comprehensive goals such as the development of a "digital twin"[2] will enable models to probe the measurement science space to further analyze the physics and gain understanding, leading to new science.

Near-term objectives include:

  • Establish a LIMS COI for MML (as of this writing, already in place)
  • Develop pilot LIMS solutions for targeted research workflows (as of this writing, several solutions already piloted and deployed for laboratory operations)
  • Design tiered LIMS architectures to support a range of research workflow implementations
  • Establish core infrastructure services to support LIMS
  • Develop data acquisition and experimental activity capture solutions
  • Deploy key functional support services such as a Handle.Net service (supports persistent identifiers) and data transfer service (see the later subsection on supporting services)
  • Prototype and exchange LIMS components as a basis for shared resources (e.g., repository platforms for experimental activity; instrumentation; samples; extract, transform, and load [ETL])
  • Establish best practices in digital object management to support standards and community practice

Long-term objectives include:

  • Establish best practices for development and implementation of data models and semantics
  • Deploy prioritized LIMS end-to-end solutions (achieving multi-component level functionality)
  • Develop use-case-driven solutions for cross-laboratory LIMS interconnectivity
  • Provide on-demand system-level LIMS resources for research teams at NIST for rapid engagement
  • Develop automated workflow integration between LIMS and computational platforms (e.g., high-performance computing [HPC], ML, and analysis applications like SciServer)
  • Develop integration between LIMS and public data access systems
  • Establish methodology for building and applying "digital twin" models
  • Provide NIST leadership a strategy for adoption and implementation of LIMS to promote innovative data-driven science

Challenges

Within the MML LIMS COI exchange forum, lessons learned and shared by early adopters of LIMS highlighted several key challenges. These will be factored into data management plans for implementation and extended to the broader MML community, which relies on either operation of LIMS or end-usage LIMS outputs.

One common challenge with instrumentation data is generation of vendor-proprietary output formats. A repository for sharing data format exchange software tools is a good example of a solution benefiting LIMS by supporting the need to transform vendor data into more consumable open data formats for downstream analysis and computation. A prototype repository was created by the Office of Data and Informatics (ODI) with a few extractor tools, and efforts are underway to explore how this may achieve wider utility. Community repositories such as Bio-Formats and MaterialsIO are examples of resources which support tools for conversions of third-party data into open data models. These community-oriented solutions successfully demonstrate methods to lower the barrier for LIMS through shared software.

Another challenge is finding appropriately skilled labor resources required for domain-specific engineering LIMS workflows. This is a common barrier to LIMS prioritization for research organizations. Integrating data structures requires close collaboration between domain and data science subject matter experts (SMEs) for modeling and mapping of multiple source data to repository storage.

Integration or migration of legacy systems and bespoke tools with next-generation LIMS architecture presents another challenge, especially for those with limited resources supporting maintenance. Legacy systems commonly lack sustainability due to factors such as end of funding support or unavailable expertise.

Data provenance is commonly required for sample tracking and traceability across laboratory processes (e.g., sample transformations, generation of parts, or inter-laboratory sample exchange). The latter is a challenge for inter-laboratory study because data management systems (including LIMS) are most commonly not standard or normalized. Supporting common data exchange protocols and chain of custody workflows will be an ongoing design consideration for interoperability, including concepts such as data trust and integrity.

Another common challenge is operational security compliance for IT infrastructure. NIST adheres to CIS Controls (Critical Security Controls,) and as LIMS architectures rely on networked systems, this translates to requirements for vigilant monitoring of service and platform deployments to ensure organizational security.

MML LIMS stakeholders

MML LIMS has a number of stakeholders with differing needs and priorities. ODI and research project leads work with all stakeholder categories, balancing goals to develop options that provide the best overall benefit. Stakeholders include:

  • Research community: Each research discipline area involves a community benefiting from the generation and exchange of data outcomes.
  • MML Laboratory Office: The MML Lab Office ensures support of the NIST organizational mission through the productivity achieved from the use of LIMS.
  • Program funding sources: Funding for strategic initiatives and research priorities for NIST and MML address both internal and external stakeholder needs and account for LIMS resources.
  • Collaborative partners: Critical stakeholders help foster research innovation and science through exchange of LIMS data and software products. A key benefit to LIMS is the ability to support collaboration through exchange and access to data, data products, software, and related resources.

External solutions and partnering

Working with the external community has resulted in a more robust MML LIMS knowledge base, benefiting from partnerships with external researchers during LIMS development and implementation, commercial procurement, and collaboration. Several organizational partners have contributed to MML efforts, including the National Research Energy Laboratory[3], Oak Ridge National Laboratory (ORNL), Brookhaven National Laboratory (BNL), Air Force Research Laboratory, 3M Corporation, NASA, University of Illinois - Champaign, and national forensic crime labs in collaboration with the National Institute of Justice.[4] Many NIST strategic research areas rely on collaborative data engagement; as such, having a LIMS with capabilities for supporting external access is important in many cases.

Collaboration with external partners may involve co-development of a LIMS platform system (or component), adoption of a community LIMS, shared access to data/code, or support for commercial vendor solution customization. Such collaborations often include harmonizing requirements and goals to inform the design, architecture, and engineering of a solution. Open-source solutions often provide more mechanisms supporting the complexity of many research workflows through their flexible configuration, customization, and independent software enhancements. A few examples of such solutions include ORNL DataFed[5], SynBioHub, BNL’s BlueSky platform[6], and SciServer. While these may require engineering expertise to fit within the NIST infrastructure, each adds meaningful user capabilities.

As one example, NIST synchrotron beamline stations at the BNL National Synchrotron Light Source II (NSLS-II) Facility have implemented BlueSky in partnership with BNL’s Data Science and Systems Integration (DSSI) group.

Several solutions have also been developed by partnering within NIST, and both are highly customized to the research requirements of their respective user groups. They include:

  1. the Nexus Electron Microscopy LIMS (NexusLIMS[7]) workflow based on the Configurable Data Repository System (CDCS), a NIST-developed open platform, and
  2. a LIMS supporting real-time biosystem cell line sample tracking LIMS with a custom Excel application used for experimental activity capture.

A few commercial solutions[a] were adopted in part or in full. In one successful example, the NIST Center for Automotive Lightweighting (NCAL) successfully implemented the commercial platform Ansys Granta. Other community and vendor solutions have been and continue to be evaluated for pilot use, such as 4CeeD[8], the Tadabase no-code solution, the Benchling cloud platform, Microsoft platforms, and others. Due to the challenging nature of customized research workflows and complexity in the secure integration with government networked infrastructure, commercial solutions may pose additional cost and skillset requirements for successful adaptation into the laboratory working environment. In most cases at NIST, use case development leads to the adoption of hybrid solutions. Closed-source solutions widely adopted in the scientific community (e.g., Globus) provide unique and robust capabilities that are difficult to recreate. In the instance of Globus, the linkage to several research community services and best practices like GridFTP, linked identity management with institutions authentication including InCommon, a Python software development kit (SDK), and multi-platform storage connections provide high value and ease of adoption.

Implementation of the roadmap

System level solutions

A LIMS, as we define it, is a system of components which delivers the capabilities for the early stages of a research life cycle.[9] It is widely recognized there is no singular LIMS solution that ranges across all disciplines of research, yet shared components (and data assets) provide greater economy of scale and consistent usage across the organization and beyond. A LIMS design implements research workflow requirements and provides context for assembling an architecture supporting component integration to produce desired outcomes. Off-the-Shelf (OTS) LIMS solutions are often challenging to adopt, primarily due to the limitations in both configuration and customization of components to match workflows. Monolithic solutions have demonstrated challenges whether they are homegrown or OTS by constraining interfaces between components. Workflow flexibility, orchestration, and evolution can be managed with lower risk[10] to overall performance when using service-oriented solutions. Therefore, implementations may vary as to which components are used and in which sequence to support the required research workflow. A tiered model ranging from basic components and plug-in architecture to instantiation of more complex data models with computational support can lower the barrier for entry. Given this context and the goal of flexibility, a few commonly used LIMS workflow components provide critical functionality.

LIMS tiers

Requirements for building or deploying LIMS are dependent upon the complexity of the research workflow and project needs. In the consideration of the system, a basic three-tiered model serves as a general guide for capabilities mapping and implementation, to level of effort (Fig. 1). As with many aspects of LIMS, this model may have variations in the strata depending on the optimal architecture for achieving desired outcomes.


Fig1 Green NIST221-22.png

Figure 1. LIMS three-tiered model for implementation

Tier 1: On-demand resources

Tier 1 is primarily reliant on infrastructure and support services and can be readily adopted with available “on-demand” resources, meaning those resources that exist and can be provisioned for use upon request. The main functions for Tier 1 LIMS include data acquisition, near-line instrument data collection, sample tracking, and data movement—possibly automated—to a storage location, ideally with access for processing and analysis. Tier 1 LIMS support unstructured data and may provide more flexibility for researchers to experiment with data structures, formats, and tuning for workflow.

Tier 2: Data science and services

Tier 2 integrates research workflow design, including data structuring, formatting, metadata, and possible integration to a data repository solution. The Tier 2 LIMS also requires a greater level of effort and engagement between a research SME and data science engineers for design, installation, and operational configuration.

Tier 3: Discipline science

The Tier 3 LIMS require the highest level of commitment and collaboration between the research SME and a data science expert to factor in all the functionality for more complex workflows. This might involve functionality such as computational system integration, support for external client tools through an application programming interface (API), and semantic modeling to factor in community adoption for data consumption. An example of a Tier 3 LIMS would include data generated from multiple instruments, each with variable processes, interconnected to computational tools and applications to produce data products, with dependencies on analytical requirements for external interoperability. Both the MML NexusLIMS and Granta NCAL systems are examples of Tier 3.

Supporting services

Common services support LIMS at various touch points throughout the research data workflow. These may be considered part of the infrastructure, i.e., they are underlying services shared and configured to support more than one stage of a LIMS workflow. However, by nature, they often require a contextual prescription for how they interface with a LIMS workflow of component. Examples of these key support services include:

  • Data transfer services: manual, automation, and tools for file movement between storage locations
  • Handle service: generates and resolves a standard persistent identifier (PID) known as a Handle
  • Repository systems: OTS solutions or well-established and supported repository platforms (e.g., CDCS, Cordra, GitLab, GitHub) and open-source solutions supporting customization for (meta)data
  • Containerized deployments: Docker, Kubernetes, virtual machine (VM) or cloud services supporting deployments of LIMS applications and tools
  • Solutions brokering: Documentation and communications, including instructional guides (“playbooks”) and consultation for design and use of LIMS systems, services, and components

Infrastructure resources for LIMS

Organizationally MML and NIST both provide infrastructure-as-a-service (IAAS) for networking, storage, and compute resources. These systems may be requested through internal NIST IT services and are readily available.

As part of the MML Data and Informatics strategic plan, both network and storage have been significantly expanded to support higher bandwidth for data transfer between laboratory instruments and storage (see Fig. 2). Data space allocations are designed to be flexible and can be established on request for cross-organizational projects, research teams, and instrument dedicated endpoints. Several network-attached storage (NAS) solutions have been implemented to support both localized (data collection nodes) and central data storage (CDS; a dedicated MML Research Data storage array). Additionally, the NIST Amazon Web Service (AWS) cloud environment is available and integrated with the NIST VPN (virtual private network) providing both storage and compute on demand.

A research equipment network (REN) is used to manage pass through and routing between instrument laboratory equipment and data storage. Network configuration via the REN ensures that instrument-control computers are protected and isolated as needed for secure operations. Higher speed network backbones and a Science DMZ[11] is currently in a planning phase to provide higher throughputs and secured zones for operating with LIMS architectures.


Fig2 Green NIST221-22.png

Figure 2. MML LIMS data plumbing model for harvesting instrumental raw data

Current infrastructure resources include DCN, NAS, CDS, AWS, and local research data storage, as well as enki, AWS, SciServer, and HPC computational resources. Meanwhile, efforts are ongoing to integrate the CIS Controls into the infrastructure and expand capabilities in support of LIMS. Security for government research networks is a top priority for NIST, and all LIMS deployments must adhere to cybersecurity controls and processes for monitoring and managing access. This critical infrastructure element requires coordination between LIMS developers and IT security offices to ensure systems are protected and compliant with CIS including monitoring, security patching, and notification of problems for mitigation. In developing a plan to implement LIMS, resource needs to complete security assessment and authorization should be taken into consideration.

Community standards and best practices

Several standards and best practices in the research data community are key to building an effective LIMS workspace. While we list a few concepts core to LIMS and examples of use, these are only a small subset in the growing field of data science. Standards and best practices will continue to evolve, requiring resources for maintenance, expansion, community engagement, and user training, though one goal of a successful LIMS implementation is to minimize these burdens for end users. Examples of data standards and practices used with LIMS are found in Table 1.

Table 1. Data standards and practices used with LIMS
Community standard/practice Examples of use
Persistent identifiers DOIs, ORCIDs, ARKid, Handles
Open-source code USNISTGOV GitHub (ETL pipelines, libraries, tools)
FAIR (findable, accessible, interoperable, reusable) Data Principles[12] Go-Fair.org, FAIR Maturity Model[13]
Semantic standards DCAT, Schema.org, DataCite, LinkedData, discipline-specific taxonomy, ontological
Standardized communications network protocols RESTful API or Open API, e.g., Swagger Docs

LIMS architecture

Capabilities of LIMS

LIMS helps a researcher by providing capabilities in an integrated way. These capabilities are delivered by one or more of the system components and serve as drivers for LIMS requirements. Common data-oriented functions, such as sample tracking, from the user perspective might also be more readily identified in terms of the capabilities. These may take on slightly different naming conventions and forms across implementations, yet in general they can be described in the context of the function they support within research environments. Capabilities may be mapped into the design architecture through implementation of modular components, as shown in Fig. 3. For external vendor and community open LIMS platforms or sub-components, understanding how and where in the system these capabilities are supported is also key to evaluating their compatibility with research workflows.

Core capabilities generally fall into one of several categories. A few example aspects supported by each category are provided.

  • Data generation: provenance of data, software descriptions and custom code, instrument configuration, data acquisition descriptions
  • Digital information management: data models, data formats and reformatting, file and data set access, persistent identifier resolution, curation, sample tracking
  • Data ingestion: absorption of data, attachment of metadata, digital transformations, support for combinations of structured and non-structured data sources
  • Data archiving: long term preservation, automated backup, user permission access controls, life cycle maintenance
  • Processing and analysis: instrument and model calibration records, data pipelines for computation of derived or final results, visualization and reporting tools
  • Data publishing and sharing: table or figure preparation, organization, crossholdings search, data as a service
  • Digital asset movement: internal and external sharing, migration between platforms, exports to other platforms, life cycle management, and other capabilities that support capture and organization of data from equipment, preservation and recovery of data, tracking of samples, automated conversion of data to open formats, linking of information between systems, data querying for download, and countless others as needs evolve


Fig3 Green NIST221-22.png

Figure 3. LIMS components grouped by major functional category

LIMS components

The architecture of LIMS is built with a variety of functional components (Fig. 3), which may be implemented using different platform services, tools, and interfaces. They are shown here grouped into a set of four functional categories (generation, persistence, communication, and consumption & distribution), which are also illustrated in the LIMS networked architecture of Fig. 6 (discussed later). Systems engineering factors into capabilities, components, and research requirements to build out an integrated LIMS such as the sample workflow model illustrated in Fig. 4. The model shows examples of data generation: instruments, electronic laboratory notebook (ELN)[14], or simulations, which are managed by intake to repository and file storage to support processing, analysis, and user search and access.


Fig4 Green NIST221-22.png

Figure 4. A basic LIMS functional workflow model

Research workflows

Requirements for each research discipline may be unique and require customized workflows. There are many common patterns, though each instrument and component implementation may have a unique interface for the input and output. It is critical when defining workflows to incorporate the perspective of experimentalists (or theoreticians and computational scientists) such that key inputs and outputs are captured in the correct sequence and factor into the human interaction dependencies and touchpoints. Several common themes include activity capture, management and recall of varied data, availability of infrastructure and resource skills, and iterative data planning to improve upon practice. One approach is to define the native workflow (perhaps through a white-boarding exercise) to identify the principle experimental attributes, e.g., instrument, calendar (time), sample preparation and characterization, laboratory notes and observations, and processes. Another exercise complementary to the research workflow is the definition of downstream data queries which SMEs would routinely use. The emerging theme of “decision science” has been introduced as the concept which helps use-case-driven design to factor in what questions (query) will the LIMS support through access to holdings/services/tools. These queries motivate the process of analyzing, describing, and defining workflows. Furthermore, they provide scientifically motivated context for LIMS design such that the granularity of structured data and operational utility of the system will provide appropriate functionality for analytical tools.

Examples of research workflow are illustrated in Fig. 5, showing how native workflows serve as use cases to identify which LIMS components are required to support specific capabilities.


Fig5 Green NIST221-22.png

Figure 5. Assorted research workflows in native model representations

LIMS networked architecture

An end-to-end architecture workflow view with networked functionality is shown in Fig. 6. The data progresses from generation through persistence via communication and leads to resulting consumption and distribution. This illustrates at a conceptual level a generalized workflow with networked systems. The operational requirements for an end-to-end architecture to support scientific application requires infrastructure reliability, performance, reference “playbooks” for developers, and consistent availability of microservices which provide the flexibility for integration and functionality.


Fig6 Green NIST221-22.png

Figure 6. LIMS networked architecture concept with four stages of operation

The supporting LIMS infrastructure takes into consideration the supporting services and sources of information and is critical in providing the foundation for applications. Sub-LIMS—i.e., specialized systems such as freezer systems and inventory & tracking systems—also need to be accounted for, in particular those which will continue to involve manual inputs and operations. The “people” resources for LIMS, shared infrastructure, and specialized systems are also critical resources for building and maintaining an environment for successful operation.

Metrics and qualitative measures

As important as the establishing and vetting of the vision, strategic goals, and operating principles outlined here is the establishment of a set of metrics that can be used internally and externally to quantify how well NIST MML LIMS efforts are progressing. These metrics include:

  • LIMS usage statistics: MML and NIST are beginning the process of collecting statistics on usage and impact of data and software systems. These are expected to be useful for project implementation as well as for all activities in which MML promotes the benefits of data-driven research. To achieve this, LIMS efforts will require supporting systems for sharing usage of instrumentation and technology in MML.
  • High science impact capabilities: While we cannot guarantee publication or adoption in high-impact science areas, we should be able to demonstrate the potential for such results to be produced. Metrics should focus on data products, features, and processes that increase the potential for high-impact science. Peer-reviewed papers, data publications, and industry impact measurements will be used to demonstrate this measure.
  • Collaborative activities and capabilities: As a major focus for our future, tracking the number and extent of collaborative activities will be important. Tracking should include quality and cost effectiveness assessments.
  • Cost, schedule, and scope targets for milestones: These metrics will be used for demonstrating value, as well as for highlighting potential efficiency areas.

Conclusion

In the MML research environment, a wide array of applications and digital tools are adopted to conduct both experimental and theoretical measurement science. We have embarked upon a new mission space for data management and sharing which we believe will significantly enhance the scientific return on investment in our laboratory. Adoption of LIMS provides a modern approach toward this goal by embedding data and software systems supporting research workflows and capturing information in a manner compatible with the FAIR Data Principles.[12] In this roadmap, we introduce important concepts for the design and implementation of a LIMS. The benefits of a LIMS are already beginning to demonstrate their value to science through more efficient access to data for analysis over time. Our next-generation LIMS will provide standards for interoperability and collaboration, further enabling scientific investigation spanning across experimental groups. With proper LIMS design in place, solutions will evolve in tandem with research domain knowledge, providing an essential resource to stimulate scientific innovation.

Footnotes

  1. Any mention of commercial products is for information only; it does not imply recommendation or endorsement by NIST.

Acronyms and initialisms

AFRL: Air Force Research Laboratory

AI: artificial intelligence

ARKid: Archival Resource Key identifiers

API: application programming interface

AWS: Amazon Web Services

BNL: Brookhaven National Laboratory

CDCS: Configurable Data Curation System

CDS: central data storage

CIS: Critical Security

COI: community of interest

DCAT: Data Catalog Vocabulary

DCN: data compute node

DOI: digital object identifier

DSSI: Data Science and Systems Integration

ELN: electronic laboratory notebook

EM: environmental monitor

ESNET: Energy Sciences Network

ETL: extract, transform, and load

FAIR: findable, accessible, interoperable, and reusable

FTP: File Transfer Protocol

HPC: high-performance computing

IaaS: infrastructure as a service

IC: instrument controller

IT: information technology

LIMS: laboratory information management system

MI: monitored instrumentation

ML: machine learning

MML: Material Measurement Laboratory

NAS: network-attached storage

NASA: National Aeronautics and Space Administration

NCAL: NIST Center for Automotive Lightweighting

NIST: National Institute of Standards and Technology

NREL: National Renewable Energy Laboratory

NSLS-II: National Synchrotron Light Source-II

ODI: Office of Data and Informatics (in MML, NIST)

ORCID: Open Researcher and Contributor IDentifier

ORNL: Oak Ridge National Laboratory

OTS: off-the-shelf

PID: persistent identifier

REN: Research Equipment Network

REST: representational state transfer

SDK: software development kit

SME: subject matter expert

VM: virtual machine

VPN: virtual private network

Acknowledgements

The authors would like to thank the numerous internal and external collaborators who have shared their knowledge, experience, and access to LIMS solutions and tools. The Configurable Data Curation System project team within the NIST Information Technology Laboratory Software and Systems Division has devoted considerable time and effort to design and implementation for several of the solutions described. The NIST Office of Information Systems Management, Research Services Office has been a key collaborator for design, deployment, and operations within the NIST infrastructure.

References

  1. "Resources". Federal Data Strategy: Leveraging Data as a Strategic Asset. GSA Technology Transformation Services. https://strategy.data.gov/resources/. Retrieved April 2022. 
  2. Knapp, G.L.; Mukherjee, T.; Zuback, J.S.; Wei, H.L.; Palmer, T.A.; De, A.; DebRoy, T. (1 August 2017). "Building blocks for a digital twin of additive manufacturing" (in en). Acta Materialia 135: 390–399. doi:10.1016/j.actamat.2017.06.039. https://linkinghub.elsevier.com/retrieve/pii/S1359645417305141. 
  3. Talley, Kevin R.; White, Robert; Wunder, Nick; Eash, Matthew; Schwarting, Marcus; Evenson, Dave; Perkins, John D.; Tumas, William et al. (1 December 2021). "Research data infrastructure for high-throughput experimental materials science" (in en). Patterns 2 (12): 100373. doi:10.1016/j.patter.2021.100373. PMC PMC8672147. PMID 34950901. https://linkinghub.elsevier.com/retrieve/pii/S266638992100235X. 
  4. Bollinger, K.; Salyards, J.; Satcher, R. et al. (1 August 2020). [https://nij.ojp.gov/library/publications/landscape-study-laboratory-information-management-systems-forensic-crime "A Landscape Study of Laboratory Information Management Systems (LIMS) for Forensic Crime Laboratories"]. National Institute of Justice Forensic Technology Center of Excellence. https://nij.ojp.gov/library/publications/landscape-study-laboratory-information-management-systems-forensic-crime. Retrieved April 2022. 
  5. Stansberry, Dale; Somnath, Suhas; Breet, Jessica; Shutt, Gregory; Shankar, Mallikarjun (7 April 2020). "DataFed: Towards Reproducible Research via Federated Data Management". arXiv:2004.03710 [cs]. http://arxiv.org/abs/2004.03710. 
  6. Arkilic, A.; Allan, D. B.; Caswell, T.A.; Li, L.; Lauer, K.; Abeykoon, S. (4 March 2017). "Towards Integrated Facility-Wide Data Acquisition and Analysis at NSLS-II" (in en). Synchrotron Radiation News 30 (2): 44–45. doi:10.1080/08940886.2017.1289810. ISSN 0894-0886. https://www.tandfonline.com/doi/full/10.1080/08940886.2017.1289810. 
  7. Taillon, Joshua A.; Bina, Thomas F.; Plante, Raymond L.; Newrock, Marcus W.; Greene, Gretchen R.; Lau, June W. (1 June 2021). "NexusLIMS: A Laboratory Information Management System for Shared-Use Electron Microscopy Facilities" (in en). Microscopy and Microanalysis 27 (3): 511–527. doi:10.1017/S1431927621000222. ISSN 1431-9276. PMC PMC8551308. PMID 33908340. https://www.cambridge.org/core/product/identifier/S1431927621000222/type/journal_article. 
  8. Lau, June W.; Devers, Rachel F.; Newrock, Marcus; Greene, Gretchen (17 December 2019). "Laboratory Information Management Systems for Electron Microscopy: Evaluation of the 4CeeD Data Curation Platform" (in en). Journal of Research of the National Institute of Standards and Technology 124: 124034. doi:10.6028/jres.124.034. ISSN 2165-7254. PMC PMC7351567. PMID 34877182. https://nvlpubs.nist.gov/nistpubs/jres/124/jres.124.034.pdf. 
  9. "Research Data Management". UC Santa Cruz University Library. https://guides.library.ucsc.edu/datamanagement/. Retrieved April 2022. 
  10. Hopson, M.; McFadden, V.; Refoy, R. et al. (September 2020). "De-risking Government Technology: Federal Agency Field Guide". 18F, General Services Administration. https://derisking-guide.18f.gov/federal-field-guide/. Retrieved April 2022. 
  11. Dart, Eli; Rotman, Lauren; Tierney, Brian; Hester, Mary; Zurawski, Jason (2014). "The Science DMZ: A Network Design Pattern for Data-Intensive Science" (in en). Scientific Programming. doi:10.3233/spr-140382. https://www.hindawi.com/journals/sp/2014/701405/. 
  12. 12.0 12.1 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (1 December 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. http://www.nature.com/articles/sdata201618. 
  13. Wilkinson, Mark D.; Dumontier, Michel; Sansone, Susanna-Assunta; Bonino da Silva Santos, Luiz Olavo; Prieto, Mario; Batista, Dominique; McQuilton, Peter; Kuhn, Tobias et al. (1 December 2019). "Evaluating FAIR maturity through a scalable, automated, community-governed framework" (in en). Scientific Data 6 (1): 174. doi:10.1038/s41597-019-0184-5. ISSN 2052-4463. PMC PMC6754447. PMID 31541130. http://www.nature.com/articles/s41597-019-0184-5. 
  14. Gates, Richard S.; McLean, Mark J.; Osborn, William A. (1 December 2015). "Smart Electronic Laboratory Notebooks for the NIST Research Environment" (in en). Journal of Research of the National Institute of Standards and Technology 120: 293. doi:10.6028/jres.120.018. ISSN 2165-7254. PMC PMC4730679. PMID 26958447. https://nvlpubs.nist.gov/nistpubs/jres/120/jres.120.018.pdf. 

Notes

This document falls in the U.S. public domain and is republished courtesy of the National Institute of Standards and Technology. This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. The original document mislabels the figures; they are corrected in this version. A few external links were also added or corrected. The original uses numbers for footnotes, but this wiki uses letters, by design. Citations 12 and 13 are reversed and were swapped to their correct places for this version.