Journal:Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences

From LIMSWiki
Revision as of 21:19, 7 December 2021 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences
Journal Data Science Journal
Author(s) Damerow, Joan E.; Varadharajan, Charuleka; Boye, Kristin; Brodie, Eoin L.; Burrus, Madison; Chadwick, K. Dana; Crystal-Ornelas, Robert; Elbashandy, Hesham; Alves, Ricardo J.E.; Ely, Kim S.; Goldman, Amy E.; Haberman, Ted; Hendrix, Valerie; Kakalia, Zarine; Kemner, Kenneth M.; Kersting, Annie B.; Merino, Nancy; O'Brien, Fianna; Perzan, Zach; Robles, Emily; Sorensen, Patrick; Stegen, James C.; Walls, Ramona L.; Weisenhorn, Pamela; Zavarin, Mavrik; Agarwal, Deborah
Author affiliation(s) Lawrence Berkeley National Laboratory, SLAC National Accelerator Laboratory, Stanford University, Brookhaven National Laboratory, Pacific Northwest National Laboratory, Metadata Game Changers, Argonne National Laboratory, Lawrence Livermore National Laboratory, University of Arizona
Primary contact Email: JoanDamerow at lbl dot gov
Year published 2021
Volume and issue 20(1)
Article # 11
DOI 10.5334/dsj-2021-011
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2021-011/
Download https://datascience.codata.org/articles/10.5334/dsj-2021-011/galley/1055/download/ (PDF)

Abstract

Physical samples are foundational entities for research across the biological, Earth, and environmental sciences. Data generated from sample-based analyses are not only the basis of individual studies, but can also be integrated with other data to answer new and broader-scale questions. Ecosystem studies increasingly rely on multidisciplinary team-based science to study climate and environmental changes. While there are widely adopted conventions within certain domains to describe sample data, these have gaps when applied in a multidisciplinary context.

In this study, we reviewed existing practices for identifying, characterizing, and linking related environmental samples. We then tested practicalities of assigning persistent identifiers to samples, with standardized metadata, in a pilot field test involving eight United States Department of Energy projects. Participants collected a variety of sample types, with analyses conducted across multiple facilities. We address terminology gaps for multidisciplinary research and make recommendations for assigning identifiers and metadata that supports sample tracking, integration, and reuse. Our goal is to provide a practical approach to sample management, geared towards ecosystem scientists who contribute and reuse sample data.

Keywords: International GeoSample Numbers (IGSN), physical samples, soil, water, plant, leaf, microbial communities, related identifiers, persistent identifiers

Introduction

The study of natural ecosystems requires multidisciplinary science teams to understand and model processes from molecular to global scales.[1] Many research activities involve diverse collections of samples and associated field or laboratory measurements.[2][3] For example, studies of organic matter cycling through plants and soil involves analysis of samples to represent soil biogeochemistry, microbial communities, plant structures, leaf gas exchange, and traits of the specific organisms involved.[4][5][6] Each scientific expert, project team, and discipline has a responsibility to ensure that others can interpret, integrate, and reuse their sample data to help solve emerging problems as our global environment continues to change.[7]

Collaboration across disciplines requires a more unified approach to report basic information about key data entities, such as samples. One challenge in promoting a unified way of reporting sample data is that some research communities have already developed community-specific conventions, including those for omics samples[8][9][10], biodiversity records[11], and geoscience samples.[2][12] A larger challenge is that many researchers use no formal reporting conventions, or exclude information needed to interpret and reuse the data.[13] More coordination is needed across these communities to develop a multidisciplinary reporting format for physical samples that is widely adopted, or to ensure that standards are interoperable. Common reporting would support effective discovery, integration, and reuse of sample data that spans scientific domains.

Sample identifiers are also needed to associate and manage important information describing a sample (i.e., metadata), such as the location, date, environmental context, and purpose of sample collection. For multidisciplinary studies, the task of generating and managing unique sample identifiers and associated metadata can be complicated, particularly as important contextual information is added throughout the data lifecycle.[14] Samples are sent to different collaborators, laboratories, and user facilities, and then combined into a variety of digital records and publications (Figure 1).[15] As a result, scientists face challenges with data management, metadata management, tracking, or the ability to integrate and reuse valuable sample data. Without attention, these inefficiencies result in data and metadata loss and inhibit the potential of scientific discovery.


Fig1 Damerow DataSciJourn21 20-1.png

Figure 1. Tracking interdisciplinary samples throughout the cycle of field collection, transport to collaborators and other labs, various analyses, and digital records

Our overall goal was to address sample identification and metadata needs of ecosystem scientists, and was driven by the user community of the U.S. Department of Energy’s (DOE’s) data repository for Earth and environmental sciences, the Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE).[16] The DOE’s Environmental Systems Science (ESS) program relies on multidisciplinary, team-based science to study complex processes within terrestrial ecosystems, spanning from the bedrock through the rhizosphere and vegetation to the atmospheric surface layer.[17] This community is well-positioned to help address specific challenges in standardizing and integrating data and metadata about a variety of environmental samples (e.g., soil, water, plant, and associated biological material used for omics analyses), which applies broadly to environmental research.[18][19][20][21][22]

We focus on sample identifiers and metadata that support the FAIR Guiding Principles (findability, accessibility, interoperability, and reusability) from the multidisciplinary domain-science perspective.[23][24][25][26][27] We therefore use a community-focused approach to: a.) evaluate existing options for sample identifiers and metadata descriptions for ecosystem science samples; b.) pilot the process of standardizing sample information to evaluate practical issues from domain-science perspectives; and c.) outline practical recommendations for sample identifier allocation, tracking, and associated metadata.

Methods

References

  1. Weart, Spencer (26 February 2013). "Rise of interdisciplinary research on climate". Proceedings of the National Academy of Sciences 110 (Supplement 1): 3657–3664. doi:10.1073/pnas.1107482109. PMC PMC3586608. PMID 22778431. https://www.pnas.org/content/110/Supplement_1/3657. 
  2. 2.0 2.1 Devaraju, A.; Klump, J.; Cox, S.J.D. et al. (1 November 2016). "Representing and publishing physical sample descriptions" (in en). Computers & Geosciences 96: 1–10. doi:10.1016/j.cageo.2016.07.018. ISSN 0098-3004. https://www.sciencedirect.com/science/article/pii/S0098300416302023. 
  3. Ponsero, Alise J; Bomhoff, Matthew; Blumberg, Kai; Youens-Clark, Ken; Herz, Nina M; Wood-Charlson, Elisha M; Delong, Edward F; Hurwitz, Bonnie L (31 July 2020). "Planet Microbe: a platform for marine microbiology to discover and analyze interconnected ‘omics and environmental data". Nucleic Acids Research 49 (D1): D792–D802. doi:10.1093/nar/gkaa637. ISSN 0305-1048. PMC PMC7778950. PMID 32735679. https://academic.oup.com/nar/article/49/D1/D792/5879428. 
  4. Cordeiro, Amanda L.; Norby, Richard J.; Andersen, Kelly M.; Valverde-Barrantes, Oscar; Fuchslueger, Lucia; Oblitas, Erick; Hartley, Iain P.; Iversen, Colleen M. et al. (2020). "Fine-root dynamics vary with soil depth and precipitation in a low-nutrient tropical forest in the Central Amazonia" (in en). Plant-Environment Interactions 1 (1): 3–16. doi:10.1002/pei3.10010. ISSN 2575-6265. https://onlinelibrary.wiley.com/doi/abs/10.1002/pei3.10010. 
  5. Malik, Ashish A.; Martiny, Jennifer B. H.; Brodie, Eoin L.; Martiny, Adam C.; Treseder, Kathleen K.; Allison, Steven D. (1 January 2020). "Defining trait-based microbial strategies with consequences for soil carbon cycling under climate change" (in en). The ISME Journal 14 (1): 1–9. doi:10.1038/s41396-019-0510-0. ISSN 1751-7370. PMC PMC6908601. PMID 31554911. https://www.nature.com/articles/s41396-019-0510-0. 
  6. Treseder, Kathleen K.; Balser, Teri C.; Bradford, Mark A.; Brodie, Eoin L.; Dubinsky, Eric A.; Eviner, Valerie T.; Hofmockel, Kirsten S.; Lennon, Jay T. et al. (3 September 2011). "Integrating microbial ecology into ecosystem models: challenges and priorities". Biogeochemistry 109 (1-3): 7–18. doi:10.1007/s10533-011-9636-5. ISSN 0168-2563. http://dx.doi.org/10.1007/s10533-011-9636-5. 
  7. Soranno, Patricia A.; Schimel, David S. (2014). "Macrosystems ecology: big data, big ecology" (in en). Frontiers in Ecology and the Environment 12 (1): 3–3. doi:10.1890/1540-9295-12.1.3. ISSN 1540-9309. https://onlinelibrary.wiley.com/doi/abs/10.1890/1540-9295-12.1.3. 
  8. Field, Dawn; Amaral-Zettler, Linda; Cochrane, Guy; Cole, James R.; Dawyndt, Peter; Garrity, George M.; Gilbert, Jack; Glöckner, Frank Oliver et al. (21 June 2011). "The Genomic Standards Consortium" (in en). PLOS Biology 9 (6): e1001088. doi:10.1371/journal.pbio.1001088. ISSN 1545-7885. PMC PMC3119656. PMID 21713030. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001088. 
  9. Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna et al. (27 October 2014). "The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification". Nucleic Acids Research 43 (D1): D1099–D1106. doi:10.1093/nar/gku950. ISSN 1362-4962. PMC PMC4384021. PMID 25348402. https://academic.oup.com/nar/article/43/D1/D1099/2439522. 
  10. Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R.; Amaral-Zettler, Linda; Gilbert, Jack A.; Karsch-Mizrachi, Ilene et al. (1 May 2011). "Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications" (in en). Nature Biotechnology 29 (5): 415–420. doi:10.1038/nbt.1823. ISSN 1546-1696. PMC PMC3367316. PMID 21552244. https://www.nature.com/articles/nbt.1823. 
  11. Wieczorek, John; Bloom, David; Guralnick, Robert; Blum, Stan; Döring, Markus; Giovanni, Renato; Robertson, Tim; Vieglais, David (6 January 2012). "Darwin Core: An Evolving Community-Developed Biodiversity Data Standard" (in en). PLOS ONE 7 (1): e29715. doi:10.1371/journal.pone.0029715. ISSN 1932-6203. PMC PMC3253084. PMID 22238640. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0029715. 
  12. System For Earth Sample Registration (SESAR) (6 February 2020) (in en). SESAR Batch Registration Quick Guide. doi:10.5281/ZENODO.3874923. https://zenodo.org/record/3874923. 
  13. Roche, Dominique G.; Kruuk, Loeske E. B.; Lanfear, Robert; Binning, Sandra A. (10 November 2015). "Public Data Archiving in Ecology and Evolution: How Well Are We Doing?" (in en). PLOS Biology 13 (11): e1002295. doi:10.1371/journal.pbio.1002295. ISSN 1545-7885. PMC PMC4640582. PMID 26556502. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002295. 
  14. Treloar, Andrew; Klump, Jens (20 December 2019). "Updating the Data Curation Continuum" (in en). International Journal of Digital Curation 14 (1): 87–101. doi:10.2218/ijdc.v14i1.643. ISSN 1746-8256. http://www.ijdc.net/article/view/643. 
  15. Chase, John H.; Bolyen, Evan; Rideout, Jai Ram; Caporaso, J. Gregory (22 December 2015). "cual-id: Globally Unique, Correctable, and Human-Friendly Sample Identifiers for Comparative Omics Studies" (in EN). mSystems. doi:10.1128/mSystems.00010-15. PMC PMC5069752. PMID 27822516. https://journals.asm.org/doi/abs/10.1128/mSystems.00010-15. 
  16. Varadharajan, C.; Cholia, S.; Snavely, C. et al. (8 January 2019). "Launching an Accessible Archive of Environmental Data" (in en-US). Eos. doi:10.1029/2019eo111263. http://eos.org/science-updates/launching-an-accessible-archive-of-environmental-data. 
  17. Biological and Environmental Research Advisory Committee (2017). "Grand Challenges for Biological and Environmental Research: Progress and Future Vision" (PDF). U.S. Department of Energy. https://genomicscience.energy.gov/BERfiles/BERAC-2017-Grand-Challenges-Report.pdf. 
  18. Chadwick, K. Dana; Brodrick, Philip G.; Grant, Kathleen; Goulden, Tristan; Henderson, Amanda; Falco, Nicola; Wainwright, Haruko; Williams, Kenneth H. et al. (2020). "Integrating airborne remote sensing and field campaigns for ecology and Earth system science" (in en). Methods in Ecology and Evolution 11 (11): 1492–1508. doi:10.1111/2041-210X.13463. ISSN 2041-210X. https://onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13463. 
  19. Serbin, Shawn P.; Wu, Jin; Ely, Kim S.; Kruger, Eric L.; Townsend, Philip A.; Meng, Ran; Wolfe, Brett T.; Chlus, Adam et al. (2019). "From the Arctic to the tropics: multibiome prediction of leaf mass per area using leaf reflectance" (in en). New Phytologist 224 (4): 1557–1568. doi:10.1111/nph.16123. ISSN 1469-8137. https://onlinelibrary.wiley.com/doi/abs/10.1111/nph.16123. 
  20. Stegen, James C.; Goldman, Amy E. (9 October 2018). "WHONDRS: a Community Resource for Studying Dynamic River Corridors" (in EN). mSystems. doi:10.1128/mSystems.00151-18. PMC PMC6178584. PMID 30320221. https://journals.asm.org/doi/abs/10.1128/mSystems.00151-18. 
  21. Wu, Jin; Rogers, Alistair; Albert, Loren P.; Ely, Kim; Prohaska, Neill; Wolfe, Brett T.; Oliveira, Raimundo Cosme; Saleska, Scott R. et al. (2019). "Leaf reflectance spectroscopy captures variation in carboxylation capacity across species, canopy environment and leaf age in lowland moist tropical forests" (in en). New Phytologist 224 (2): 663–674. doi:10.1111/nph.16029. ISSN 1469-8137. https://onlinelibrary.wiley.com/doi/abs/10.1111/nph.16029. 
  22. Wu, Jin; Serbin, Shawn P.; Ely, Kim S.; Wolfe, Brett T.; Dickman, L. Turin; Grossiord, Charlotte; Michaletz, Sean T.; Collins, Adam D. et al. (2020). "The response of stomatal conductance to seasonal drought in tropical forests" (in en). Global Change Biology 26 (2): 823–839. doi:10.1111/gcb.14820. ISSN 1365-2486. https://onlinelibrary.wiley.com/doi/abs/10.1111/gcb.14820. 
  23. Beck, Marcus W.; O’Hara, Casey; Lowndes, Julia S. Stewart; Mazor, Raphael D.; Theroux, Susanna; Gillett, David J.; Lane, Belize; Gearheart, Gregory (20 July 2020). "The importance of open science for biological assessment of aquatic environments" (in en). PeerJ 8: e9539. doi:10.7717/peerj.9539. ISSN 2167-8359. PMC PMC7377246. PMID 32742805. https://peerj.com/articles/9539. 
  24. Conze, Ronald; Lorenz, Henning; Ulbricht, Damian; Elger, Kirsten; Gorgas, Thomas (25 January 2017). "Utilizing the International Geo Sample Number Concept in Continental Scientific Drilling During ICDP Expedition COSC-1" (in en). Data Science Journal 16: 2. doi:10.5334/dsj-2017-002. ISSN 1683-1470. http://datascience.codata.org/articles/10.5334/dsj-2017-002/. 
  25. Lehnert, Kerstin; Wyborn, Lesley; Klump, Jens (2019). "FAIR Geoscientific Samples and Data Need International Collaboration" (in en). Acta Geologica Sinica - English Edition 93 (S3): 32–33. doi:10.1111/1755-6724.14236. ISSN 1755-6724. https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-6724.14236. 
  26. Stall, Shelley; Yarmey, Lynn; Cutcher-Gershenfeld, Joel; Hanson, Brooks; Lehnert, Kerstin; Nosek, Brian; Parsons, Mark; Robinson, Erin et al. (1 June 2019). "Make scientific data FAIR" (in en). Nature 570 (7759): 27–29. doi:10.1038/d41586-019-01720-7. https://www.nature.com/articles/d41586-019-01720-7. 
  27. Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.