Difference between revisions of "Journal:Ten simple rules for managing laboratory information"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 61: Line 61:
LIMS can also support the use and service of key laboratory equipment. User manuals, service dates, warranties, and other identifying information can be attached directly to the equipment record, which allows for timely service and maintenance of the equipment. Adding equipment to the inventory can also prevent accidental losses in shared spaces where it is easy for people to borrow equipment without returning it. The label attached to the equipment (see Rule 5, later) acts as an indication of ownership that limits the risk of ownership confusion when almost identical pieces of equipment are owned by neighboring laboratories. As the laboratory inventory should focus on larger, more expensive equipment and supplies, inexpensive and easily obtained equipment (i.e., office supplies) may not need to be inventoried. An additional benefit of inventory management in a LIMS is the ability to create a record connecting specific equipment and supplies to specific people and projects, which can be used to detect potential sources of technical bias and variability (see Rules 4 and 5, later).
LIMS can also support the use and service of key laboratory equipment. User manuals, service dates, warranties, and other identifying information can be attached directly to the equipment record, which allows for timely service and maintenance of the equipment. Adding equipment to the inventory can also prevent accidental losses in shared spaces where it is easy for people to borrow equipment without returning it. The label attached to the equipment (see Rule 5, later) acts as an indication of ownership that limits the risk of ownership confusion when almost identical pieces of equipment are owned by neighboring laboratories. As the laboratory inventory should focus on larger, more expensive equipment and supplies, inexpensive and easily obtained equipment (i.e., office supplies) may not need to be inventoried. An additional benefit of inventory management in a LIMS is the ability to create a record connecting specific equipment and supplies to specific people and projects, which can be used to detect potential sources of technical bias and variability (see Rules 4 and 5, later).


==Rule 3: Focuson your current projects first==
==Rule 3: Focus on your current projects first==
After establishing an inventory of supplies and equipment, it is natural to consider using a similar approach with the samples that have accumulated over the years in freezers or other storage locations. This can be overwhelming because the number of samples will be orders of magnitude larger than the number of supplies. In addition, documenting them is likely to require more effort than simply retrieving a product documentation from a vendor’s catalog.
 
Allocating limited resources to making an inventory of samples generated by past projects may not benefit current and future projects. A more practical approach is to prioritize tracking samples generated by ongoing projects and document samples generated by past projects on an as-needed basis.
 
===Inventory your samples before you generate them===
It is a common mistake to create sample records well after they were produced in the lab. The risks of this retroactive approach to recordkeeping include information loss, as well as selective recordkeeping, in which only some samples are deemed important enough to document while most temporary samples are not, even though they may provide critical information. (This mistake can be compounded in situations where regulatory requirements demand [[data integrity]], including the contemporaneous recording of data.<ref name="AstrixWhatThe22">{{cite web |url=https://astrixinc.com/blog/fda-guidance-for-data-integrity/ |title=What the FDA Guidance on Data Integrity Means for Your Lab |publisher=Astrix, Inc |work=Our Insights |date=11 May 2022 |accessdate=18 March 2024}}</ref>)
 
A more proactive approach avoids these pitfalls. When somebody walks into a lab to start an experiment, the samples that will be generated by this experiment should be known. It is possible to create the computer records corresponding to these samples before initiating the laboratory processes that generates the physical samples. The creation of a sample record can therefore be seen as part of the experiment planning process (Fig. 1). This makes it possible to preemptively print labels that will be used to track samples used at different stages of the process (see Rule 5, later).
 
It may also be useful to assign statuses to samples as they progress through different stages of their life cycle, such as “to do,” “in progress,” “completed,” or “canceled” to differentiate active works in progress from the backlog and samples generated by previous experiments. As the experimental process moves forward, data can be continually appended to the sample record. For example, the field to capture the concentration of a solution would be filled after the solution has been prepared. Thus, the success, or failure, of experiments can be easily documented and used to inform the next round of experiments.
 
===Develop sample retention policies===
It is always unpleasant to have to throw away poorly documented samples. The best strategy to avoid this outcome is to develop policies to discard only samples that will not be used in the future, a process rendered more objective and straightforward with adequate documentation. Properly structured workflows (see Rule 8, later) should define criteria for which samples should be kept and for how long. All lab members should be trained in these policies to ensure consistency, and policies should be revisited as new research operating procedures are initiated.
 
It can be tempting to keep every tube or plate that still contains something as a backup. This conservative strategy generates clutter, confusion, and reproducibility issues, especially in the absence of documentation. While it makes sense to keep some intermediates during the execution of a complex experimental workflow, the successful completion of the experiment should trigger the elimination of intermediates that have lost their purpose, have limited shelf life, and/or are not reusable. During this intermediate step, samples that are deemed as critical backup should be stored in a separate location from the working sample to minimize the likelihood of loss of both samples in case of electrical failure, fire, etc. Using clear labels (see Rules 4 and 5, later) and storing intermediate samples in dedicated storage locations can help with the enforcement of sample disposition policies.
 
==Rule 4: Use computer-generated sample identification numbers==
Generating sample names is probably not the best use of scientists’ creativity. Many labs still rely on manually generated sample names that may look something like “JP PCR 8/23 4.” Manually generated sample names are time-consuming to generate, difficult to interpret, and often contain insufficient information. Therefore, they should not be the primary identifier used to track samples.
 
Instead, computer-generated sample identification numbers (sample ID) should be utilized as the primary ID as they are able to overcome these limitations. Rather than describing the sample, a computer-generated sample ID provides a link between a physical sample and a database entry that contains more information associated with the sample. The Sample ID is the only piece of information that needs to be printed on the sample label (see Rule 5, later) because it allows researchers to retrieve all the sample information from a database. A sample tracking system should rely on both computer-readable and human-readable sample IDs.
 
===Computer-readable IDs===
Since the primary purpose of a sample ID is to provide a link between a physical sample and the computer record that describes the sample, it saves time to rely on sample IDs that can be scanned by a reader or even a smartphone (Fig. 2). [27,28] [[Barcode]]s are special fonts to print data in a format that can be easily read by an optical sensor. [29] There are also newer alternatives, such as quick response (QR) codes, data matrices, or radio-frequency identification (RFID) to tag samples. [30,31] QR codes and data matrices are 2D barcodes that are cheaper to generate than RFID tags and store more data than traditional barcodes. [27] Nevertheless, these technologies encode a key that points to a database record.
 
 
[[File:Fig2 Berezin PLoSCompBio23 19-12.png|500px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="500px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 2.''' Sample label. The first line includes a unique computer-readable barcode as well as a human-readable computer-generated sample identification number. The second and third lines include a description of the sample content, the date, and the identity of the inoculum.</blockquote>
|-
|}
|}
 
Uniqueness is the most important property of the data encoded in barcodes, and the use of unique and persistent identifiers is a critical component of the Findability of your (meta)data. [24] Several vendors now offer products with 2D barcodes printed on the side or bottom of the tube. It is common for such products, as well as lab-made reagents, to be split across multiple tubes or moved from one tube to another. In these cases, each of these “new” samples should have unique barcodes. A barcoding system can therefore facilitate the accurate identification of “parent” samples (e.g., a stock solution with ID X) and the unique “child” samples derived from them (e.g., aliquots of the stock solution with IDs Y and Z).
 
===Human-readable IDs===
While computer-readable IDs should be the main ID used when tracking a sample or supply, it is sometimes necessary for laboratory personnel to have a secondary sample ID they can read without the use of any equipment or while doing manual operations (i.e., handling samples).
 
To make an identifier readable by humans, it is best to keep the ID short and use their structure to provide contextual information. For example, the use of a prefix may help interpret the ID. For example, the ID #CHEM1234 would naturally be interpreted as a chemical or #MCUL2147 as a mammalian culture (Fig. 2).
 
Since these identifiers do not need to map to a unique database entry, human-readable IDs do not have the same uniqueness requirements as computer-readable IDs. For example, it may be acceptable to allow two labs using the same software environment to use the same human-readable ID, because this ID will only be seen in the context of a single lab. The software system should maintain the integrity of the relationships between the human-readable ID and the computer-readable ID by preventing users from editing these identifiers.
 
==Rule 5: Label everything==
Print labels to identify supplies, equipment, samples, storage locations, and any other physical objects used in your lab. Many labs are extensively relying on handwritten markings that create numerous problems. [17] A limited amount of information can be written on small sample containers, and like manually generated sample names, handwritten labels can be difficult to read or interpret.
 
Some labels are self-contained. For example, shipping labels include all the information necessary to deliver a package. However, in a laboratory environment, a sample label must not only identify a physical sample but also establish a connection to a record describing the sample and the data associated with it (Fig. 2).
 
===Content of a label===
Only two pieces of information are necessary on a label: a computer-readable sample ID printed as a barcode and a human-readable sample ID to make it easier for the researcher to work with the sample. If there is enough space to print more information on the label, your research needs should inform your label design. Ensure you have sufficient space to meet regulatory labeling requirements (e.g., biosafety requirements, hazards) and if desired, information such as the sample type, sample creator, date (e.g., of generation or expiration), or information about related samples (e.g., parent/child samples).
 
===Label printing technology===
Putting in place a labeling solution requires the integration of several elements, but once configured, proper use of label printing technologies makes it much faster and easier to print labels than to label tubes manually.
 
There are many types of label printers on the market today, and most are compatible with the Zebra Programming Language (ZPL) standard. [32] Labeling laboratory samples can be challenging due to harsh environmental conditions: exposure to liquid nitrogen or other chemicals, as well as ultra-low or high temperatures, will require specialized labels. For labeling plastic items, thermal transfer will print the most durable labels, especially if used with resin ribbon instead of wax, while inkjet printers can print durable labels for use on paper. [33–35] Furthermore, laboratory samples can be generated in a broad range of sizes, so labels should be adapted to the size of the object they are attached to. A high-resolution printer (300 dpi or greater) will make it possible to print small labels that will be easy to read by humans and scanners. Finally, select permanent or removable labels based on the application. Reusable items should be labeled with removable labels, whereas single-use containers are best labeled with permanent labels.
 
Label printing software applications can take data from a database or a spreadsheet and map different columns to the fields of label templates, helping to standardize your workflows (see Rule 8, later). They also support different formats and barcode standards. Of course, the label printing software needs to be compatible with the label printer. When selecting a barcode scanner, consider whether it supports the barcode standards that will be used in your label, as well as the size and shape of the barcodes it can scan. Inexpensive barcode scanners will have difficulty reading small barcodes printed on curved tubes with limited background, whereas professional scanners with high-performance imagers will effectively scan more challenging labels. When used, barcode scanners transmit a unique series of characters to the computer. How these characters are then used depends on the software application in which the barcode is read. Some applications will simply capture the content of the barcode. Other applications will process barcoded data in real-time to retrieve the content of the corresponding records.
 
==Rule 6: Manage your data proactively==
 
 




Line 68: Line 131:


==Notes==
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. To more easily differentiate footnotes from references, the original footnotes (which were numbered) were updated to use lowercase letters. Most footnotes referencing web pages were turned into proper citations.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Under "Inventory your samples before you generate them," the original hints at data integrity and requirements for the contemporaneous recording of data, but it's not explicitly stated; this version addresses that, particularly in the scope of regulated labs.


<!--Place all category tags here-->
<!--Place all category tags here-->

Revision as of 22:16, 18 March 2024

Full article title Ten simple rules for managing laboratory information
Journal PLoS Computational Biology
Author(s) Berezin, Casey-Tyler; Aguilera, Luis U.; Billerbeck, Sonja; Bourne, Philip E.; Densmore, Douglas; Freemont, Paul; Gorochowski, Thomas E.; Hernandez, Sarah I.; Hillson, Nathan J.; King, Connor R.; Köpke, Michael; Ma, Shuyi; Miller, Katie M.; Moon, Tae Seok; Moore, Jason H.; Munsky, Brian; Myers, Chris J.; Nicholas, Dequina A.; Peccoud, Samuel J.; Zhou, Wen; Peccoud, Jean
Author affiliation(s) Colorado State University, University of Groningen, University of Virginia, Boston University, Imperial College, University of Bristol, Lawrence Berkeley National Laboratory, US Department of Energy Agile BioFoundry, US Department of Energy Joint BioEnergy Institute, LanzaTech, University of Washington Medicine, Washington University in St. Louis, Cedars-Sinai Medical Centet, University of Colorado Boulder, University of California Irvine - Irvine
Primary contact Email: jean dot peccoud at colostate dot edu
Year published 2023
Volume and issue 19(12)
Article # e1011652
DOI 10.1371/journal.pcbi.1011652
ISSN 1553-7358
Distribution license Creative Commons CC0 1.0 Universal
Website https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011652
Download https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011652&type=printable (PDF)

Abstract

Information is the cornerstone of research, from experimental data/metadata and computational processes to complex inventories of reagents and equipment. These 10 simple rules discuss best practices for leveraging laboratory information management systems (LIMS) to transform this large information load into useful scientific findings.

Keywords: laboratory information management, laboratory management, laboratory information management systems, LIMS, computational biology, mathematical modeling, transdisciplinary research

Introduction

The development of mathematical models that can predict the properties of biological systems is the holy grail of computational biology. [1,2] Such models can be used to test biological hypotheses [3], quantify the risk of developing diseases [3], guide the development of biomanufactured products [4], engineer new systems meeting user-defined specifications, and much more. [4,5] Irrespective of a model’s application and the conceptual framework used to build it, the modeling process proceeds through a common iterative workflow. A model is first evaluated by fitting its parameters such that its behavior matches experimental data. Models that fit previous observations are then further validated by comparing the model predictions with the results of new observations that are outside the scope of the initial dataset (Fig. 1).


Fig1 Berezin PLoSCompBio23 19-12.png

Figure 1. Information management enhances the experimental and modeling cycle. Our 10 simple rules for managing laboratory information (green) augment the cycle of hypothesis formulation, design, data analysis, modeling, and decision-making (gray). The experimental design phase is improved by carefully tracking your inventory, samples, parameters, and variables. Proactive data management and the thoughtful use of databases facilitate statistical and exploratory analyses as well as the development of conclusions that inform the next round of experiments. Frequent reevaluation of project, team, and workflow success is a critical component of refining experimental processes, developing a common culture, and positioning your research group in the greater scientific context.

Historically, the collection of experimental data and the development of mathematical models were performed by different scientific communities. [6] Computational biologists had little control over the nature and quality of the data they could access. With the emergence of systems biology and synthetic biology, the boundary between experimental and computational biology has become increasingly blurred. [6] Many laboratories and junior scientists now have expertise in both producing and analyzing large volumes of digital data due to high-throughput workflows and an ever-expanding collection of digital instruments. [7] In this context, it is critically important to properly organize the exponentially growing volumes of experimental data to ensure they can support the development of models that can guide the next round of experiments. [8]

We are a group of scientists representing a broad range of scientific specialties, from clinical research to industrial biotechnology. Collectively, we have expertise in experimental biology, data science, and mathematical modeling. Some of us work in academia, while others work in industry. We have all faced the challenges of keeping track of laboratory operations to produce high-quality data suitable for analysis. We have experience using a variety of tools, including spreadsheets, open-source software, homegrown databases, and commercial solutions to manage our data. Irreproducible experiments, projects that failed to meet their goals, datasets we collected but never managed to analyze, and freezers full of unusable samples have taught us the hard way lessons that have led to these 10 simple rules for managing laboratory information.

This journal has published several sets of rules regarding best practices in overall research design [9,10], as well as the computational parts of research workflows, including data management [11–13] and software development practices. [14–16] The purpose of these 10 rules (Fig. 1) is to guide the development and configuration of laboratory information management systems (LIMS). LIMS typically offer electronic laboratory notebook (ELN), inventory, workflow planning, and data management features, allowing users to connect data production and data analysis to ensure that useful information can be extracted from experimental data and increase reproducibility. [17,18] These rules can also be used to develop training programs and lab management policies. Although we all agree that applying these rules increases the value of the data we produce in our laboratories, we also acknowledge that enforcing them is challenging. It relies on the successful integration of effective software tools, training programs, lab management policies, and the will to abide by these policies. Each lab must find the most effective way to adopt these rules to suit their unique environment.

Rule 1: Develop a common culture

Data-driven research projects generally require contributions from multiple stakeholders with complementary expertise. The success of these projects depends on entire teams developing a common vision of the projects' objectives and the approaches to be used. [19–21] Interdisciplinary teams, in particular, must establish a common language, as well as mutual expectations for experimental and publication timelines. [19] Unless the team develops a common culture, one stakeholder group can drive the project and impose its vision on the other groups. Although interdisciplinary (i.e., wet-lab and computational) training is becoming more common in academia, it is not unusual for experimentalists to regard data analysis as a technique they can acquire simply by hiring a student with computer programming skills. In a corporate environment, research informatics is often part of the information technology group whose mission is to support scientists who drive the research agenda. In both situations, the research agenda is driven by stakeholders who are unlikely to produce the most usable datasets because they lack sufficient understanding of data modeling. [20] Perhaps less frequently, there is also the situation where the research agenda is driven by people with expertise in data analysis. Because they may not appreciate the subtleties of experimental methods, they may find it difficult to engage experimentalists in collaborations aimed at testing their models. [20] Alternatively, their research may be limited to the analysis of disparate sets of previously published datasets. [19] Thus, interdisciplinary collaboration is key to maximizing the insights you gain from your data.

The development of a common culture, within a single laboratory or across interdisciplinary research teams, must begin with a thorough onboarding process for each member regarding general lab procedures, research goals, and individual responsibilities and expectations. [21,22] Implementing a LIMS requires perseverance by users, thus a major determinant of the success of a LIMS is whether end-users are involved in the culture development process. [17,23] When the input and suggestions of end-users are considered, they are more likely to engage with and apply good practices to the LIMS on a daily basis. [23] The long-term success of research endeavors then requires continued training and reevaluation of project goals and success (Fig.1). [19,21]

These 10 simple rules apply to transdisciplinary teams that have developed a common culture allowing experimentalists to gain a basic understanding of the modeling process and modelers to have some familiarity with the experimental processes generating the data they will analyze. [19] Teams that lack a common vision of data-driven research are encouraged to work toward acquiring this common vision through frequent communication and mutual goal setting. [19,20] Discussing these 10 simple rules in group meetings may aid in initiating this process.

Rule 2: Start with what you purchase

All the data produced by your lab are derived from things you have purchased, including supplies (consumables), equipment, and contract manufactured reagents, such as oligonucleotides or synthetic genes. In many cases, data (and metadata) on items in your inventory may be just as important as experimentally derived data, and as such, should be managed according to the FAIR (findable, accessible, interoperable, and reusable) principles. [24] Assembling an inventory of supplies and equipment with their associated locations can be handled in a few weeks by junior personnel without major interruption of laboratory operations, although establishing a thorough inventory may be more difficult and time-consuming for smaller labs with fewer members. Nevertheless, managing your lab inventory provides an immediate return on investment by positively impacting laboratory operations in several ways. People can quickly find the supplies and equipment they need to work, supplies are ordered with appropriate advance notice to minimize work stoppage, and data variation is reduced due to standardized supplies and the ability to track lot numbers easily (Fig. 1). [17,25,26]

Many labs still use Excel to keep track of inventory despite the existence of several more sophisticated databases and LIMS (e.g., Benchling, Quartzy, GenoFAB, LabWare, LabVantage, and TeselaGen). [25] These can facilitate real-time inventory tracking unlike a static document, increasing the findability and accessibility of inventory data. While some systems are specialized for certain types of inventories (e.g., animal colonies or frozen reagents), others are capable of tracking any type of reagent or item imaginable. [25] When considering what items to keep track of, there are three main considerations: expiration, maintenance, and ease of access.

Most labs manage their supplies through periodic cleanups of the lab, during which they sort through freezers, chemical cabinets, and other storage areas; review their contents; and dispose of supplies that are past their expiration date or are no longer useful. By actively tracking expiration dates and reagent use in a LIMS, you can decrease the frequency of such cleanups since the LIMS will alert users when expiration dates are approaching or when supplies are running low. This can prevent costly items from being wasted because they are expired or forgotten, and furthermore, the cost of products can be tracked and used to inform which experiments are performed.

LIMS can also support the use and service of key laboratory equipment. User manuals, service dates, warranties, and other identifying information can be attached directly to the equipment record, which allows for timely service and maintenance of the equipment. Adding equipment to the inventory can also prevent accidental losses in shared spaces where it is easy for people to borrow equipment without returning it. The label attached to the equipment (see Rule 5, later) acts as an indication of ownership that limits the risk of ownership confusion when almost identical pieces of equipment are owned by neighboring laboratories. As the laboratory inventory should focus on larger, more expensive equipment and supplies, inexpensive and easily obtained equipment (i.e., office supplies) may not need to be inventoried. An additional benefit of inventory management in a LIMS is the ability to create a record connecting specific equipment and supplies to specific people and projects, which can be used to detect potential sources of technical bias and variability (see Rules 4 and 5, later).

Rule 3: Focus on your current projects first

After establishing an inventory of supplies and equipment, it is natural to consider using a similar approach with the samples that have accumulated over the years in freezers or other storage locations. This can be overwhelming because the number of samples will be orders of magnitude larger than the number of supplies. In addition, documenting them is likely to require more effort than simply retrieving a product documentation from a vendor’s catalog.

Allocating limited resources to making an inventory of samples generated by past projects may not benefit current and future projects. A more practical approach is to prioritize tracking samples generated by ongoing projects and document samples generated by past projects on an as-needed basis.

Inventory your samples before you generate them

It is a common mistake to create sample records well after they were produced in the lab. The risks of this retroactive approach to recordkeeping include information loss, as well as selective recordkeeping, in which only some samples are deemed important enough to document while most temporary samples are not, even though they may provide critical information. (This mistake can be compounded in situations where regulatory requirements demand data integrity, including the contemporaneous recording of data.[1])

A more proactive approach avoids these pitfalls. When somebody walks into a lab to start an experiment, the samples that will be generated by this experiment should be known. It is possible to create the computer records corresponding to these samples before initiating the laboratory processes that generates the physical samples. The creation of a sample record can therefore be seen as part of the experiment planning process (Fig. 1). This makes it possible to preemptively print labels that will be used to track samples used at different stages of the process (see Rule 5, later).

It may also be useful to assign statuses to samples as they progress through different stages of their life cycle, such as “to do,” “in progress,” “completed,” or “canceled” to differentiate active works in progress from the backlog and samples generated by previous experiments. As the experimental process moves forward, data can be continually appended to the sample record. For example, the field to capture the concentration of a solution would be filled after the solution has been prepared. Thus, the success, or failure, of experiments can be easily documented and used to inform the next round of experiments.

Develop sample retention policies

It is always unpleasant to have to throw away poorly documented samples. The best strategy to avoid this outcome is to develop policies to discard only samples that will not be used in the future, a process rendered more objective and straightforward with adequate documentation. Properly structured workflows (see Rule 8, later) should define criteria for which samples should be kept and for how long. All lab members should be trained in these policies to ensure consistency, and policies should be revisited as new research operating procedures are initiated.

It can be tempting to keep every tube or plate that still contains something as a backup. This conservative strategy generates clutter, confusion, and reproducibility issues, especially in the absence of documentation. While it makes sense to keep some intermediates during the execution of a complex experimental workflow, the successful completion of the experiment should trigger the elimination of intermediates that have lost their purpose, have limited shelf life, and/or are not reusable. During this intermediate step, samples that are deemed as critical backup should be stored in a separate location from the working sample to minimize the likelihood of loss of both samples in case of electrical failure, fire, etc. Using clear labels (see Rules 4 and 5, later) and storing intermediate samples in dedicated storage locations can help with the enforcement of sample disposition policies.

Rule 4: Use computer-generated sample identification numbers

Generating sample names is probably not the best use of scientists’ creativity. Many labs still rely on manually generated sample names that may look something like “JP PCR 8/23 4.” Manually generated sample names are time-consuming to generate, difficult to interpret, and often contain insufficient information. Therefore, they should not be the primary identifier used to track samples.

Instead, computer-generated sample identification numbers (sample ID) should be utilized as the primary ID as they are able to overcome these limitations. Rather than describing the sample, a computer-generated sample ID provides a link between a physical sample and a database entry that contains more information associated with the sample. The Sample ID is the only piece of information that needs to be printed on the sample label (see Rule 5, later) because it allows researchers to retrieve all the sample information from a database. A sample tracking system should rely on both computer-readable and human-readable sample IDs.

Computer-readable IDs

Since the primary purpose of a sample ID is to provide a link between a physical sample and the computer record that describes the sample, it saves time to rely on sample IDs that can be scanned by a reader or even a smartphone (Fig. 2). [27,28] Barcodes are special fonts to print data in a format that can be easily read by an optical sensor. [29] There are also newer alternatives, such as quick response (QR) codes, data matrices, or radio-frequency identification (RFID) to tag samples. [30,31] QR codes and data matrices are 2D barcodes that are cheaper to generate than RFID tags and store more data than traditional barcodes. [27] Nevertheless, these technologies encode a key that points to a database record.


Fig2 Berezin PLoSCompBio23 19-12.png

Figure 2. Sample label. The first line includes a unique computer-readable barcode as well as a human-readable computer-generated sample identification number. The second and third lines include a description of the sample content, the date, and the identity of the inoculum.

Uniqueness is the most important property of the data encoded in barcodes, and the use of unique and persistent identifiers is a critical component of the Findability of your (meta)data. [24] Several vendors now offer products with 2D barcodes printed on the side or bottom of the tube. It is common for such products, as well as lab-made reagents, to be split across multiple tubes or moved from one tube to another. In these cases, each of these “new” samples should have unique barcodes. A barcoding system can therefore facilitate the accurate identification of “parent” samples (e.g., a stock solution with ID X) and the unique “child” samples derived from them (e.g., aliquots of the stock solution with IDs Y and Z).

Human-readable IDs

While computer-readable IDs should be the main ID used when tracking a sample or supply, it is sometimes necessary for laboratory personnel to have a secondary sample ID they can read without the use of any equipment or while doing manual operations (i.e., handling samples).

To make an identifier readable by humans, it is best to keep the ID short and use their structure to provide contextual information. For example, the use of a prefix may help interpret the ID. For example, the ID #CHEM1234 would naturally be interpreted as a chemical or #MCUL2147 as a mammalian culture (Fig. 2).

Since these identifiers do not need to map to a unique database entry, human-readable IDs do not have the same uniqueness requirements as computer-readable IDs. For example, it may be acceptable to allow two labs using the same software environment to use the same human-readable ID, because this ID will only be seen in the context of a single lab. The software system should maintain the integrity of the relationships between the human-readable ID and the computer-readable ID by preventing users from editing these identifiers.

Rule 5: Label everything

Print labels to identify supplies, equipment, samples, storage locations, and any other physical objects used in your lab. Many labs are extensively relying on handwritten markings that create numerous problems. [17] A limited amount of information can be written on small sample containers, and like manually generated sample names, handwritten labels can be difficult to read or interpret.

Some labels are self-contained. For example, shipping labels include all the information necessary to deliver a package. However, in a laboratory environment, a sample label must not only identify a physical sample but also establish a connection to a record describing the sample and the data associated with it (Fig. 2).

Content of a label

Only two pieces of information are necessary on a label: a computer-readable sample ID printed as a barcode and a human-readable sample ID to make it easier for the researcher to work with the sample. If there is enough space to print more information on the label, your research needs should inform your label design. Ensure you have sufficient space to meet regulatory labeling requirements (e.g., biosafety requirements, hazards) and if desired, information such as the sample type, sample creator, date (e.g., of generation or expiration), or information about related samples (e.g., parent/child samples).

Label printing technology

Putting in place a labeling solution requires the integration of several elements, but once configured, proper use of label printing technologies makes it much faster and easier to print labels than to label tubes manually.

There are many types of label printers on the market today, and most are compatible with the Zebra Programming Language (ZPL) standard. [32] Labeling laboratory samples can be challenging due to harsh environmental conditions: exposure to liquid nitrogen or other chemicals, as well as ultra-low or high temperatures, will require specialized labels. For labeling plastic items, thermal transfer will print the most durable labels, especially if used with resin ribbon instead of wax, while inkjet printers can print durable labels for use on paper. [33–35] Furthermore, laboratory samples can be generated in a broad range of sizes, so labels should be adapted to the size of the object they are attached to. A high-resolution printer (300 dpi or greater) will make it possible to print small labels that will be easy to read by humans and scanners. Finally, select permanent or removable labels based on the application. Reusable items should be labeled with removable labels, whereas single-use containers are best labeled with permanent labels.

Label printing software applications can take data from a database or a spreadsheet and map different columns to the fields of label templates, helping to standardize your workflows (see Rule 8, later). They also support different formats and barcode standards. Of course, the label printing software needs to be compatible with the label printer. When selecting a barcode scanner, consider whether it supports the barcode standards that will be used in your label, as well as the size and shape of the barcodes it can scan. Inexpensive barcode scanners will have difficulty reading small barcodes printed on curved tubes with limited background, whereas professional scanners with high-performance imagers will effectively scan more challenging labels. When used, barcode scanners transmit a unique series of characters to the computer. How these characters are then used depends on the software application in which the barcode is read. Some applications will simply capture the content of the barcode. Other applications will process barcoded data in real-time to retrieve the content of the corresponding records.

Rule 6: Manage your data proactively

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Under "Inventory your samples before you generate them," the original hints at data integrity and requirements for the contemporaneous recording of data, but it's not explicitly stated; this version addresses that, particularly in the scope of regulated labs.