Journal:Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences

From LIMSWiki
Revision as of 23:07, 7 December 2021 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences
Journal Data Science Journal
Author(s) Damerow, Joan E.; Varadharajan, Charuleka; Boye, Kristin; Brodie, Eoin L.; Burrus, Madison; Chadwick, K. Dana; Crystal-Ornelas, Robert; Elbashandy, Hesham; Alves, Ricardo J.E.; Ely, Kim S.; Goldman, Amy E.; Haberman, Ted; Hendrix, Valerie; Kakalia, Zarine; Kemner, Kenneth M.; Kersting, Annie B.; Merino, Nancy; O'Brien, Fianna; Perzan, Zach; Robles, Emily; Sorensen, Patrick; Stegen, James C.; Walls, Ramona L.; Weisenhorn, Pamela; Zavarin, Mavrik; Agarwal, Deborah
Author affiliation(s) Lawrence Berkeley National Laboratory, SLAC National Accelerator Laboratory, Stanford University, Brookhaven National Laboratory, Pacific Northwest National Laboratory, Metadata Game Changers, Argonne National Laboratory, Lawrence Livermore National Laboratory, University of Arizona
Primary contact Email: JoanDamerow at lbl dot gov
Year published 2021
Volume and issue 20(1)
Article # 11
DOI 10.5334/dsj-2021-011
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2021-011/
Download https://datascience.codata.org/articles/10.5334/dsj-2021-011/galley/1055/download/ (PDF)

Abstract

Physical samples are foundational entities for research across the biological, Earth, and environmental sciences. Data generated from sample-based analyses are not only the basis of individual studies, but can also be integrated with other data to answer new and broader-scale questions. Ecosystem studies increasingly rely on multidisciplinary team-based science to study climate and environmental changes. While there are widely adopted conventions within certain domains to describe sample data, these have gaps when applied in a multidisciplinary context.

In this study, we reviewed existing practices for identifying, characterizing, and linking related environmental samples. We then tested practicalities of assigning persistent identifiers to samples, with standardized metadata, in a pilot field test involving eight United States Department of Energy projects. Participants collected a variety of sample types, with analyses conducted across multiple facilities. We address terminology gaps for multidisciplinary research and make recommendations for assigning identifiers and metadata that supports sample tracking, integration, and reuse. Our goal is to provide a practical approach to sample management, geared towards ecosystem scientists who contribute and reuse sample data.

Keywords: International GeoSample Numbers (IGSN), physical samples, soil, water, plant, leaf, microbial communities, related identifiers, persistent identifiers

Introduction

The study of natural ecosystems requires multidisciplinary science teams to understand and model processes from molecular to global scales.[1] Many research activities involve diverse collections of samples and associated field or laboratory measurements.[2][3] For example, studies of organic matter cycling through plants and soil involves analysis of samples to represent soil biogeochemistry, microbial communities, plant structures, leaf gas exchange, and traits of the specific organisms involved.[4][5][6] Each scientific expert, project team, and discipline has a responsibility to ensure that others can interpret, integrate, and reuse their sample data to help solve emerging problems as our global environment continues to change.[7]

Collaboration across disciplines requires a more unified approach to report basic information about key data entities, such as samples. One challenge in promoting a unified way of reporting sample data is that some research communities have already developed community-specific conventions, including those for omics samples[8][9][10], biodiversity records[11], and geoscience samples.[2][12] A larger challenge is that many researchers use no formal reporting conventions, or exclude information needed to interpret and reuse the data.[13] More coordination is needed across these communities to develop a multidisciplinary reporting format for physical samples that is widely adopted, or to ensure that standards are interoperable. Common reporting would support effective discovery, integration, and reuse of sample data that spans scientific domains.

Sample identifiers are also needed to associate and manage important information describing a sample (i.e., metadata), such as the location, date, environmental context, and purpose of sample collection. For multidisciplinary studies, the task of generating and managing unique sample identifiers and associated metadata can be complicated, particularly as important contextual information is added throughout the data lifecycle.[14] Samples are sent to different collaborators, laboratories, and user facilities, and then combined into a variety of digital records and publications (Figure 1).[15] As a result, scientists face challenges with data management, metadata management, tracking, or the ability to integrate and reuse valuable sample data. Without attention, these inefficiencies result in data and metadata loss and inhibit the potential of scientific discovery.


Fig1 Damerow DataSciJourn21 20-1.png

Figure 1. Tracking interdisciplinary samples throughout the cycle of field collection, transport to collaborators and other labs, various analyses, and digital records

Our overall goal was to address sample identification and metadata needs of ecosystem scientists, and was driven by the user community of the U.S. Department of Energy’s (DOE’s) data repository for Earth and environmental sciences, the Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE).[16] The DOE’s Environmental Systems Science (ESS) program relies on multidisciplinary, team-based science to study complex processes within terrestrial ecosystems, spanning from the bedrock through the rhizosphere and vegetation to the atmospheric surface layer.[17] This community is well-positioned to help address specific challenges in standardizing and integrating data and metadata about a variety of environmental samples (e.g., soil, water, plant, and associated biological material used for omics analyses), which applies broadly to environmental research.[18][19][20][21][22]

We focus on sample identifiers and metadata that support the FAIR Guiding Principles (findability, accessibility, interoperability, and reusability) from the multidisciplinary domain-science perspective.[23][24][25][26][27] We therefore use a community-focused approach to: a.) evaluate existing options for sample identifiers and metadata descriptions for ecosystem science samples; b.) pilot the process of standardizing sample information to evaluate practical issues from domain-science perspectives; and c.) outline practical recommendations for sample identifier allocation, tracking, and associated metadata.

Methods

Review of existing sample identifiers, metadata conventions, and standards

ESS-DIVE’s work on sample identifiers and metadata began in response to a specific problem with tracking multidisciplinary samples, as they are sent to different labs and user facilities, which DOE ESS scientists brought up during community meetings. As a community-focused data repository, our approach to this issue involved leading or participating in a variety of community discussions on sample identifiers and/or associated metadata. These included:

  • presenting identifier options in an ESS community webinar and whitepaper;
  • engaging in discussion with each pilot test participant;
  • holding several meetings with U.S. DOE user facilities and data systems representatives (Joint Genome Institute, National Microbiome Data Collaborative, Environmental Molecular Sciences Laboratory, and DOE Systems Biology Knowledgebase);
  • participating in broader community meetings on identifier and metadata practices for physical samples (Earth Science Information Partners [ESIP] and Research Data Alliance [RDA]]);
  • participating in a National Microbiome Data Collaborative (NMDC) Ontology workshop;
  • participating in a USGS workshop on sample collection metadata for the National Digital Catalogue; and
  • participating in the IGSN 2040 Steering Committee and business planning.

After reviewing the scope and use of available persistent identifier (PID) options (Table 1) and community discussions, we focused additional identifier comparison on International GeoSample Numbers (IGSNs) and Archival Resource Keys (ARKs), which are most commonly used for a variety of sample types (Additional files, Supplemental Table 1). Considerations in the identifier assessment included association with a broader international community focused on sample identification and description, associated metadata to describe samples and their relationships, availability of user-friendly infrastructure to mint identifiers and validate metadata, general ease of use, and other technical identifier characteristics, listed in Additional files, Supplemental Table 1.

Table 1. Examples of PIDs that have been used for samples, modified from Guralnick et al.[28]
 
ARK = Archival Resource Keys, URN = Uniform Resource Name, URI = Uniform Resource Identifier, DOI = Digital Object Identifier, UUID = Universally Unique Identifier, IGSN = International GeoSample Number, CETAF = Consortium of the European Taxonomic Facilities, RRID = Research Resource Identifier.
Identifier type Identifier example Scope
ARK ark:/12148/btv1b8449691v Flexible
URN urn:catalog:UMMZ:Mammals:171041 Flexible
HTTP URI http://data.rbge.org.uk/herb/E00115694 Flexible
DOI 10.7299/X7VQ32SJ Flexible, mostly papers and datasets
UUID EF0A4D3E-702F-4882-81B8- CA737AEB7B28 Flexible
IGSN IGSN: IECUR0002 Geoscience, working to become general physical sample identifier
CETAF URI (based on HTTP URI) http://data.rbge.org.uk/herb/E00421503 Species Occurrence, Specimens from CETAF institutions
RRID RRID:MGI:5630441 Biomedical Research Resources
BioSample accession number SAMN03983893 Biological source materials used in experimental assays

We also reviewed existing metadata standards and templates that are relevant for samples collected by environmental scientists, including general digital object standards[29][30][31], biodiversity records[11][32], omics (e.g. genomics, metagenomics) material[8][10][33], and geoscience samples[12][34] (see Additional files, Supplemental Table 2). We created a translation table comparing 49 metadata elements (see Additional files, Supplemental Table 3) in human-readable format. The translation table depicts linkages where metadata elements were common across standards, as well as differences.

The core IGSN Descriptive Metadata Schema[35] includes basic metadata associated with sample collection, which is generally relevant across sample types. This schema links metadata profiles that differ across six currently-functioning IGSN allocating agents. The System For Earth Sample Registration (SESAR; the first allocating agent) has no access restrictions for obtaining IGSNs and provides user-friendly services for sample management.[36] The SESAR metadata profile and controlled terms are currently focused on geoscience samples, but the IGSN organization seeks to accommodate multiple disciplines and has already expanded into plant and other biological samples for some IGSN allocating agents. Our translation table for sample metadata allowed us to identify metadata elements and terms that could be revised or extended within the SESAR profile for improved representation of other sample types (see Additional files, Supplemental Table 3).

Biology-related standards are well-established, commonly used in the community, and are particularly important for ecosystem science samples. Genomic and metagenomic analyses and data publication require use of standards developed by the Genetic Standards Consortium (GSC)[8], namely Minimum Information about any Sequence (MIxS) and Minimum Information about any Metagenome (MIMS).[10] DarwinCore is a metadata standard for biodiversity records that has been widely adopted across the biocollections community.[11] It is also required for submitting data to the Global Biodiversity Information Facility (GBIF), which allows global search and integration of biodiversity records.[37][38] GBIF provides a valuable service as a data aggregator, and thus has driven standards adoption, enabling a wide range of data reuse applications in published biodiversity studies[37][39], including over 5,000 known citations from studies using biodiversity records.[40]

We researched ontologies that could be used to describe a broad set of environmental sample types, including the Biological Collections Ontology (BCO)[41], Environment Ontology (ENVO)[42], Population and Community Ontology (PCO)[43], and Plant Ontology (PO)[44] to identify additional or alternate terms to generally describe other types of soil, sediment, water, gas, and biology-related samples.[45]

We also engaged with the broader, international community working on sample-related practices. This broader community is led by members of the IGSN organization, with participation across other national agencies (e.g. USGS, CSIRO, Australia Research Data Commons-ARDC) and data organizations (ESIP and RDA). This community participation was important in identifying best practices in identifier and metadata use, and contributing perspectives of ecosystem sciences in the broader community working on sample standardization. Continued participation in the broader informatics and domain science communities is important for improving interoperability and usability of sample-related standards.

Sample identifier and metadata testing in the field

In order to develop a sample metadata reporting format that was informed by our domain science community, we worked with scientists from eight different Environmental Systems Science projects to conduct a pilot test for using sample PIDs and metadata. In particular, we tested the practicality of the IGSN, which appeared to be the best choice amongst relevant PIDs for our purposes. These projects had varying scopes and sample types, and were all funded by DOE’s Office of Science Environmental Systems Science (ESS) program (see Additional files, Supplemental Table 4).

Prior to sample registration, we discussed expected sample types involved, how to assign IGSNs and link related samples, essential metadata needed to understand specific sample types, and past sample tracking workflows with representatives from each project. Some projects had already collected samples and preferred to register for IGSNs after collection to be associated with digital files, while other projects pre-registered their samples before collection, or registered directly after collection. We used initial feedback and background research to identify several core descriptive sample metadata fields likely to be necessary for searches on ESS-DIVE to be most effective, including standardized information on the following[45] (also see Additional files, Supplemental Table 3 for the full translation table comparing metadata elements from existing standards and templates):

  • IGSN and Parent IGSN (where relevant)
  • Sample Name (project-specific sample name, must be unique)
  • Chief Scientist/Collector
  • Sample Type fields:
    • Object Type (e.g. Individual sample, core, site),
    • Material (e.g. Liquid-aqueous, Rock, Soil, Biology),
    • Sampled Feature (primary physiographic feature sample collected from)
  • Location Information (Latitude, Longitude in WGS84; Location description),
  • Date (ISO 8601; e.g., 1954-04-07),
  • Collection Method Description
  • Project

Note that this list represents the initial IGSN metadata fields that should be required and were subsequently revised after our pilot test work. Many additional metadata fields are available and are recommended or optional depending on the sample type.[12]

The researchers involved in our testing used SESAR’s sample management portal MySESAR to register samples and update metadata. We recommended a specific workflow for participants to register their samples and update sample collection metadata, outlined in our GitHub repository[46] and associated dataset.[45]

We also worked with individuals to map sample history from collection of samples in the field through a variety of analyses, and publication (Figure 2). This exercise helped us determine sample tracking needs and develop recommendations for assigning PIDs and linking highly-related samples and subsamples.


Fig2 Damerow DataSciJourn21 20-1.png

Figure 2. Sample journey map, using the sample PID and metadata to document sample history and link related samples in the WHONDRS project.[20][47] PNNL = Pacific Northwest National Laboratory; EMSL = Environmental Molecular Sciences Laboratory; ORNL = Oak Ridge National Laboratory; GOLD = Genomes Online Database.

After sample collection and registration, we asked and discussed the following questions:

  1. What sample collection metadata is needed to understand resulting sample data?
  2. How much effort did it take to register samples and standardize metadata?
  3. What is needed to make sample PID registration and standardization easier?

Developing the final IGSN-ESS reporting guidelines

We used a combination of research on existing standards and pilot test feedback to develop final recommendations for allocating identifiers and assigning standard metadata.[45] We took extensive notes during meetings with pilot test participants, and we compiled specific feedback on improving guidance on allocating identifiers and relationships, metadata needed to understand relevant sample types, and improve efficiency of sample registration and standardization. Pilot test participants identified metadata elements that needed to be added, modified, or removed to improve relevance for multidisciplinary ecosystem science samples. We then used our translation table (see Additional files, Supplemental Table 3) comparing other existing standards to guide-specific recommendations. For example, to address feedback regarding inefficiencies in providing all metadata at individual sample levels, we added the Darwin Core elements Location ID, Collection ID, and Event ID. We then reviewed existing, commonly-used ontologies (ENVO, BCO, PO) to select important vocabulary terms to characterize sample type, material, and environmental context. We developed a list of relevant terms based on pilot test studies, and all participants helped decide on our final term lists for object type and material, specifically.

All feedback was addressed in our final recommendations, which we compiled into GitHub, and more user-friendly GitBook documentation. This documentation includes instructions on registering samples for IGSNs using our revised template, specific definitions/instructions/examples for each metadata element, lists of terms for elements where controlled vocabulary is needed, and instructions for how to contribute feedback using GitHub, and how to cite the final format. To develop documentation, we used the ESS-DIVE community GitHub for samples, inspired from user-friendly documentation for Darwin Core, which facilitates additional community feedback (through public GitHub issues) and versioning. We presented our final recommendations and documentation in two additional community webinars, which are advertised to ESS-DIVE users and ESS scientists, and published on the ESS-DIVE website. The purpose of community webinars was to present our conclusions and collect any additional feedback.

As a community-oriented data repository, we will continue to gather feedback and develop additional tools to support users in submitting, searching for, integrating, and reusing high-quality sample data.

Results

References

  1. Weart, Spencer (26 February 2013). "Rise of interdisciplinary research on climate". Proceedings of the National Academy of Sciences 110 (Supplement 1): 3657–3664. doi:10.1073/pnas.1107482109. PMC PMC3586608. PMID 22778431. https://www.pnas.org/content/110/Supplement_1/3657. 
  2. 2.0 2.1 Devaraju, A.; Klump, J.; Cox, S.J.D. et al. (1 November 2016). "Representing and publishing physical sample descriptions" (in en). Computers & Geosciences 96: 1–10. doi:10.1016/j.cageo.2016.07.018. ISSN 0098-3004. https://www.sciencedirect.com/science/article/pii/S0098300416302023. 
  3. Ponsero, Alise J; Bomhoff, Matthew; Blumberg, Kai; Youens-Clark, Ken; Herz, Nina M; Wood-Charlson, Elisha M; Delong, Edward F; Hurwitz, Bonnie L (31 July 2020). "Planet Microbe: a platform for marine microbiology to discover and analyze interconnected ‘omics and environmental data". Nucleic Acids Research 49 (D1): D792–D802. doi:10.1093/nar/gkaa637. ISSN 0305-1048. PMC PMC7778950. PMID 32735679. https://academic.oup.com/nar/article/49/D1/D792/5879428. 
  4. Cordeiro, Amanda L.; Norby, Richard J.; Andersen, Kelly M.; Valverde-Barrantes, Oscar; Fuchslueger, Lucia; Oblitas, Erick; Hartley, Iain P.; Iversen, Colleen M. et al. (2020). "Fine-root dynamics vary with soil depth and precipitation in a low-nutrient tropical forest in the Central Amazonia" (in en). Plant-Environment Interactions 1 (1): 3–16. doi:10.1002/pei3.10010. ISSN 2575-6265. https://onlinelibrary.wiley.com/doi/abs/10.1002/pei3.10010. 
  5. Malik, Ashish A.; Martiny, Jennifer B. H.; Brodie, Eoin L.; Martiny, Adam C.; Treseder, Kathleen K.; Allison, Steven D. (1 January 2020). "Defining trait-based microbial strategies with consequences for soil carbon cycling under climate change" (in en). The ISME Journal 14 (1): 1–9. doi:10.1038/s41396-019-0510-0. ISSN 1751-7370. PMC PMC6908601. PMID 31554911. https://www.nature.com/articles/s41396-019-0510-0. 
  6. Treseder, Kathleen K.; Balser, Teri C.; Bradford, Mark A.; Brodie, Eoin L.; Dubinsky, Eric A.; Eviner, Valerie T.; Hofmockel, Kirsten S.; Lennon, Jay T. et al. (3 September 2011). "Integrating microbial ecology into ecosystem models: challenges and priorities". Biogeochemistry 109 (1-3): 7–18. doi:10.1007/s10533-011-9636-5. ISSN 0168-2563. http://dx.doi.org/10.1007/s10533-011-9636-5. 
  7. Soranno, Patricia A.; Schimel, David S. (2014). "Macrosystems ecology: big data, big ecology" (in en). Frontiers in Ecology and the Environment 12 (1): 3–3. doi:10.1890/1540-9295-12.1.3. ISSN 1540-9309. https://onlinelibrary.wiley.com/doi/abs/10.1890/1540-9295-12.1.3. 
  8. 8.0 8.1 8.2 Field, Dawn; Amaral-Zettler, Linda; Cochrane, Guy; Cole, James R.; Dawyndt, Peter; Garrity, George M.; Gilbert, Jack; Glöckner, Frank Oliver et al. (21 June 2011). "The Genomic Standards Consortium" (in en). PLOS Biology 9 (6): e1001088. doi:10.1371/journal.pbio.1001088. ISSN 1545-7885. PMC PMC3119656. PMID 21713030. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001088. 
  9. Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna et al. (27 October 2014). "The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification". Nucleic Acids Research 43 (D1): D1099–D1106. doi:10.1093/nar/gku950. ISSN 1362-4962. PMC PMC4384021. PMID 25348402. https://academic.oup.com/nar/article/43/D1/D1099/2439522. 
  10. 10.0 10.1 10.2 Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R.; Amaral-Zettler, Linda; Gilbert, Jack A.; Karsch-Mizrachi, Ilene et al. (1 May 2011). "Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications" (in en). Nature Biotechnology 29 (5): 415–420. doi:10.1038/nbt.1823. ISSN 1546-1696. PMC PMC3367316. PMID 21552244. https://www.nature.com/articles/nbt.1823. 
  11. 11.0 11.1 11.2 Wieczorek, John; Bloom, David; Guralnick, Robert; Blum, Stan; Döring, Markus; Giovanni, Renato; Robertson, Tim; Vieglais, David (6 January 2012). "Darwin Core: An Evolving Community-Developed Biodiversity Data Standard" (in en). PLOS ONE 7 (1): e29715. doi:10.1371/journal.pone.0029715. ISSN 1932-6203. PMC PMC3253084. PMID 22238640. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0029715. 
  12. 12.0 12.1 12.2 System For Earth Sample Registration (SESAR) (6 February 2020) (in en). SESAR Batch Registration Quick Guide. doi:10.5281/ZENODO.3874923. https://zenodo.org/record/3874923. 
  13. Roche, Dominique G.; Kruuk, Loeske E. B.; Lanfear, Robert; Binning, Sandra A. (10 November 2015). "Public Data Archiving in Ecology and Evolution: How Well Are We Doing?" (in en). PLOS Biology 13 (11): e1002295. doi:10.1371/journal.pbio.1002295. ISSN 1545-7885. PMC PMC4640582. PMID 26556502. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002295. 
  14. Treloar, Andrew; Klump, Jens (20 December 2019). "Updating the Data Curation Continuum" (in en). International Journal of Digital Curation 14 (1): 87–101. doi:10.2218/ijdc.v14i1.643. ISSN 1746-8256. http://www.ijdc.net/article/view/643. 
  15. Chase, John H.; Bolyen, Evan; Rideout, Jai Ram; Caporaso, J. Gregory (22 December 2015). "cual-id: Globally Unique, Correctable, and Human-Friendly Sample Identifiers for Comparative Omics Studies" (in EN). mSystems. doi:10.1128/mSystems.00010-15. PMC PMC5069752. PMID 27822516. https://journals.asm.org/doi/abs/10.1128/mSystems.00010-15. 
  16. Varadharajan, C.; Cholia, S.; Snavely, C. et al. (8 January 2019). "Launching an Accessible Archive of Environmental Data" (in en-US). Eos. doi:10.1029/2019eo111263. http://eos.org/science-updates/launching-an-accessible-archive-of-environmental-data. 
  17. Biological and Environmental Research Advisory Committee (2017). "Grand Challenges for Biological and Environmental Research: Progress and Future Vision" (PDF). U.S. Department of Energy. https://genomicscience.energy.gov/BERfiles/BERAC-2017-Grand-Challenges-Report.pdf. 
  18. Chadwick, K. Dana; Brodrick, Philip G.; Grant, Kathleen; Goulden, Tristan; Henderson, Amanda; Falco, Nicola; Wainwright, Haruko; Williams, Kenneth H. et al. (2020). "Integrating airborne remote sensing and field campaigns for ecology and Earth system science" (in en). Methods in Ecology and Evolution 11 (11): 1492–1508. doi:10.1111/2041-210X.13463. ISSN 2041-210X. https://onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13463. 
  19. Serbin, Shawn P.; Wu, Jin; Ely, Kim S.; Kruger, Eric L.; Townsend, Philip A.; Meng, Ran; Wolfe, Brett T.; Chlus, Adam et al. (2019). "From the Arctic to the tropics: multibiome prediction of leaf mass per area using leaf reflectance" (in en). New Phytologist 224 (4): 1557–1568. doi:10.1111/nph.16123. ISSN 1469-8137. https://onlinelibrary.wiley.com/doi/abs/10.1111/nph.16123. 
  20. 20.0 20.1 Stegen, James C.; Goldman, Amy E. (9 October 2018). "WHONDRS: a Community Resource for Studying Dynamic River Corridors" (in EN). mSystems. doi:10.1128/mSystems.00151-18. PMC PMC6178584. PMID 30320221. https://journals.asm.org/doi/abs/10.1128/mSystems.00151-18. 
  21. Wu, Jin; Rogers, Alistair; Albert, Loren P.; Ely, Kim; Prohaska, Neill; Wolfe, Brett T.; Oliveira, Raimundo Cosme; Saleska, Scott R. et al. (2019). "Leaf reflectance spectroscopy captures variation in carboxylation capacity across species, canopy environment and leaf age in lowland moist tropical forests" (in en). New Phytologist 224 (2): 663–674. doi:10.1111/nph.16029. ISSN 1469-8137. https://onlinelibrary.wiley.com/doi/abs/10.1111/nph.16029. 
  22. Wu, Jin; Serbin, Shawn P.; Ely, Kim S.; Wolfe, Brett T.; Dickman, L. Turin; Grossiord, Charlotte; Michaletz, Sean T.; Collins, Adam D. et al. (2020). "The response of stomatal conductance to seasonal drought in tropical forests" (in en). Global Change Biology 26 (2): 823–839. doi:10.1111/gcb.14820. ISSN 1365-2486. https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.14820. 
  23. Beck, Marcus W.; O’Hara, Casey; Lowndes, Julia S. Stewart; Mazor, Raphael D.; Theroux, Susanna; Gillett, David J.; Lane, Belize; Gearheart, Gregory (20 July 2020). "The importance of open science for biological assessment of aquatic environments" (in en). PeerJ 8: e9539. doi:10.7717/peerj.9539. ISSN 2167-8359. PMC PMC7377246. PMID 32742805. https://peerj.com/articles/9539. 
  24. Conze, Ronald; Lorenz, Henning; Ulbricht, Damian; Elger, Kirsten; Gorgas, Thomas (25 January 2017). "Utilizing the International Geo Sample Number Concept in Continental Scientific Drilling During ICDP Expedition COSC-1" (in en). Data Science Journal 16: 2. doi:10.5334/dsj-2017-002. ISSN 1683-1470. http://datascience.codata.org/articles/10.5334/dsj-2017-002/. 
  25. Lehnert, Kerstin; Wyborn, Lesley; Klump, Jens (2019). "FAIR Geoscientific Samples and Data Need International Collaboration" (in en). Acta Geologica Sinica - English Edition 93 (S3): 32–33. doi:10.1111/1755-6724.14236. ISSN 1755-6724. https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-6724.14236. 
  26. Stall, Shelley; Yarmey, Lynn; Cutcher-Gershenfeld, Joel; Hanson, Brooks; Lehnert, Kerstin; Nosek, Brian; Parsons, Mark; Robinson, Erin et al. (1 June 2019). "Make scientific data FAIR" (in en). Nature 570 (7759): 27–29. doi:10.1038/d41586-019-01720-7. https://www.nature.com/articles/d41586-019-01720-7. 
  27. Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 
  28. Guralnick, Robert P.; Cellinese, Nico; Deck, John; Pyle, Richard L.; Kunze, John; Penev, Lyubomir; Walls, Ramona; Hagedorn, Gregor et al. (4 June 2015). "Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data" (in en). ZooKeys 494: 133–154. doi:10.3897/zookeys.494.9352. ISSN 1313-2970. PMC PMC4400380. PMID 25901117. https://zookeys.pensoft.net/article/5042/. 
  29. DataCite Metadata Working Group (2019). DataCite Metadata Schema for the Publication and Citation of Research Data v4.2. Madeleine de Smaele, Amy Hatfield Hart, Jan Ashton, Isabel Bernal Martinez, Stefanie Dietiker, Jannean Elliot. doi:10.5438/RV0G-AV03. http://schema.datacite.org/meta/kernel-4.2/. 
  30. DCMI Usage Board (20 January 2020). "DCMI Metadata Terms". Dublin Core Metadata Initiative. DCMI. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/. Retrieved 16 September 2020. 
  31. Cox, Simon Jonathan David (2011) (in en). ISO 19156:2011 - Geographic information -- Observations and measurements. International Organization for Standardization. doi:10.13140/2.1.1142.3042. http://rgdoi.net/10.13140/2.1.1142.3042. 
  32. Group, Darwin Core Task (8 November 2014), "Darwin Core: 2014-11-08", Biodiversity Information Standards (TDWG) (Zenodo), doi:10.5281/zenodo.12694, https://zenodo.org/record/12694 
  33. Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna et al. (27 October 2014). "The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification". Nucleic Acids Research 43 (D1): D1099–D1106. doi:10.1093/nar/gku950. ISSN 1362-4962. PMC PMC4384021. PMID 25348402. https://doi.org/10.1093/nar/gku950. 
  34. System For Earth Sample Registration (SESAR) (17 February 2020) (in en). SESAR XML Schema for samples. doi:10.5281/ZENODO.3875531. https://zenodo.org/record/3875531. 
  35. IGSN (24 August 2017). "IGSN metadata". GitHub. https://github.com/IGSN/metadata. 
  36. "Welcome to SESAR". SESAR. 2021. https://www.geosamples.org/. 
  37. 37.0 37.1 Samy, Gaiji; Chavan, Vishwas; Ariño, Arturo H.; Otegui, Javier; Hobern, Donald; Sood, Rajesh; Robles, Estrella (9 July 2013). "Content assessment of the primary biodiversity data published through GBIF network: Status, challenges and potentials". Biodiversity Informatics 8 (2). doi:10.17161/bi.v8i2.4124. ISSN 1546-9735. http://dx.doi.org/10.17161/bi.v8i2.4124. 
  38. Robertson, Tim; Döring, Markus; Guralnick, Robert; Bloom, David; Wieczorek, John; Braak, Kyle; Otegui, Javier; Russell, Laura et al. (6 August 2014). "The GBIF Integrated Publishing Toolkit: Facilitating the Efficient Publishing of Biodiversity Data on the Internet" (in en). PLOS ONE 9 (8): e102623. doi:10.1371/journal.pone.0102623. ISSN 1932-6203. PMC PMC4123864. PMID 25099149. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0102623. 
  39. Ball-Damerow, Joan E.; Brenskelle, Laura; Barve, Narayani; Soltis, Pamela S.; Sierwald, Petra; Bieler, Rüdiger; LaFrance, Raphael; Ariño, Arturo H. et al. (11 September 2019). "Research applications of primary biodiversity databases in the digital age" (in en). PLOS ONE 14 (9): e0215794. doi:10.1371/journal.pone.0215794. ISSN 1932-6203. PMC PMC6738577. PMID 31509534. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0215794. 
  40. "Global Biodiversity Information Facility". Global Biodiversity Information Facility. 2021. https://www.gbif.org/. 
  41. Walls, Ramona L.; Deck, John; Guralnick, Robert; Baskauf, Steve; Beaman, Reed; Blum, Stanley; Bowers, Shawn; Buttigieg, Pier Luigi et al. (3 March 2014). "Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies" (in en). PLOS ONE 9 (3): e89606. doi:10.1371/journal.pone.0089606. ISSN 1932-6203. PMC PMC3940615. PMID 24595056. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089606. 
  42. Buttigieg, Pier Luigi; Pafilis, Evangelos; Lewis, Suzanna E.; Schildhauer, Mark P.; Walls, Ramona L.; Mungall, Christopher J. (23 September 2016). "The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation". Journal of Biomedical Semantics 7 (1): 57. doi:10.1186/s13326-016-0097-6. ISSN 2041-1480. PMC PMC5035502. PMID 27664130. https://doi.org/10.1186/s13326-016-0097-6. 
  43. Osumi-Sutherland, D.; Zheng, J.; Buttigieg, P.L. et al. (n.d.). "Population and Community Ontology". https://raw.githubusercontent.com/PopulationAndCommunityOntology/pco/master/pco.owl. 
  44. Avraham, Shulamit; Tung, Chih-Wei; Ilic, Katica; Jaiswal, Pankaj; Kellogg, Elizabeth A.; McCouch, Susan; Pujar, Anuradha; Reiser, Leonore et al. (1 January 2008). "The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations". Nucleic Acids Research 36 (suppl_1): D449–D454. doi:10.1093/nar/gkm908. ISSN 0305-1048. PMC PMC2238838. PMID 18194960. https://academic.oup.com/nar/article/36/suppl_1/D449/2507667. 
  45. 45.0 45.1 45.2 45.3 Damerow, Joan; Varadharajan, Charu; Boye, Kristin; Brodie, Eoin; Burrus, Madison; Chadwick, Dana; Cholia, Shreyas; Crystal-Ornelas, Robert et al.. (2020), "ESS-DIVE Global Sample Numbers and and Metadata Reporting Format for Environmental Systems Science (IGSN-ESS)" (in en), ESS-DIVE (Environmental System Science Data Infrastructure for a Virtual Ecosystem; Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE)), doi:10.15485/1660470, https://www.osti.gov/servlets/purl/1660470/. Retrieved 2021-12-07 
  46. ESS-DIVE (2021). "ESS-DIVE Sample ID and Metadata Reporting Format (IGSN-ESS) v1.1.0". GitHub. https://github.com/ess-dive-community/essdive-sample-id-metadata. 
  47. Toyoda, Jason G; Goldman, Amy E; Chu, Rosalie K; Danczak, Robert E; Daly, Rebecca A; Garayburu-Caruso, Vanessa A; Graham, Emily B; Lin, Xinming et al.. (2020), "WHONDRS Summer 2019 Sampling Campaign: Global River Corridor Surface Water FTICR-MS and Stable Isotopes" (in en), ESS-DIVE (Environmental System Science Data Infrastructure for a Virtual Ecosystem; Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS)), doi:10.15485/1603775, https://www.osti.gov/servlets/purl/1603775/. Retrieved 2021-12-07 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.