Difference between revisions of "Journal:Bringing big data to bear in environmental public health: Challenges and recommendations"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 34: Line 34:
Out of the tens of thousands of individual chemicals currently in commerce (and many more mixtures, natural products, and metabolites) less than 10 percent have been screened for safety. The [[United States Environmental Protection Agency]]'s (EPA's) [[Toxic Substances Control Act of 1976|Toxic Substances Control Act]] (TSCA) Chemical Substance Inventory contains roughly 85,000 chemicals<ref name="EPA_TSCAChem">{{cite web |url=https://www.epa.gov/tsca-inventory/about-tsca-chemical-substance-inventory |title=TSCA Chemical Substance Inventory: About the TSCA Chemical Substance Inventory |author=Environmental Protection Agency |publisher=Environmental Protection Agency |accessdate=04 January 2019}}</ref>, and the European Chemicals Agency (ECHA) Inventory lists over 100,000 unique substances (as of the most recent update in August 2017), of which approximately 22,000 are registered substances with detailed information on structure, usage, or toxicity.<ref name="ECHA_ECInventory">{{cite web |url=https://echa.europa.eu/information-on-chemicals/ec-inventory |title=EC Inventory |author=European Chemicals Agency |publisher=European Chemicals Agency |accessdate=04 January 2019}}</ref> Understanding which chemicals in the environment—both with and without safety data—pose a risk to human health requires that we more effectively leverage the data that we already have, and that we take intelligent approaches to generating new data. While the traditional means of collecting chemical safety data (animal models) are laborious and of variable accuracy and human relevance<ref name="HartungToxic09">{{cite journal |title=Toxicology for the Twenty-First Century |journal=Nature |author=Hartung, T. |volume=460 |issue=7252 |pages=208–12 |year=2009 |doi=10.1038/460208a |pmid=19587762}}</ref>, such reference data can still be used to train models for prioritizing and predicting toxicity of new chemicals, provided the data are curated in a computationally accessible format and, ideally, integrated with other lines of evidence providing mechanistic information. This requires significant effort, both in collecting and extracting [[information]] as well as annotating it appropriately.
Out of the tens of thousands of individual chemicals currently in commerce (and many more mixtures, natural products, and metabolites) less than 10 percent have been screened for safety. The [[United States Environmental Protection Agency]]'s (EPA's) [[Toxic Substances Control Act of 1976|Toxic Substances Control Act]] (TSCA) Chemical Substance Inventory contains roughly 85,000 chemicals<ref name="EPA_TSCAChem">{{cite web |url=https://www.epa.gov/tsca-inventory/about-tsca-chemical-substance-inventory |title=TSCA Chemical Substance Inventory: About the TSCA Chemical Substance Inventory |author=Environmental Protection Agency |publisher=Environmental Protection Agency |accessdate=04 January 2019}}</ref>, and the European Chemicals Agency (ECHA) Inventory lists over 100,000 unique substances (as of the most recent update in August 2017), of which approximately 22,000 are registered substances with detailed information on structure, usage, or toxicity.<ref name="ECHA_ECInventory">{{cite web |url=https://echa.europa.eu/information-on-chemicals/ec-inventory |title=EC Inventory |author=European Chemicals Agency |publisher=European Chemicals Agency |accessdate=04 January 2019}}</ref> Understanding which chemicals in the environment—both with and without safety data—pose a risk to human health requires that we more effectively leverage the data that we already have, and that we take intelligent approaches to generating new data. While the traditional means of collecting chemical safety data (animal models) are laborious and of variable accuracy and human relevance<ref name="HartungToxic09">{{cite journal |title=Toxicology for the Twenty-First Century |journal=Nature |author=Hartung, T. |volume=460 |issue=7252 |pages=208–12 |year=2009 |doi=10.1038/460208a |pmid=19587762}}</ref>, such reference data can still be used to train models for prioritizing and predicting toxicity of new chemicals, provided the data are curated in a computationally accessible format and, ideally, integrated with other lines of evidence providing mechanistic information. This requires significant effort, both in collecting and extracting [[information]] as well as annotating it appropriately.


These toxicological problems are mirrored in [[Public health|public]] and [[environmental health]] more generally as huge, complex issues with inadequately curated data and insufficient analytic power. Recent research in toxicology has focused on high-throughput screening to rapidly produce quantitative data on thousands of human biological targets<ref name="ThomasTheNext19">{{cite journal |title=The Next Generation Blueprint of Computational Toxicology at the U.S. Environmental Protection Agency |journal=Toxicological Sciences |author=Thomas, R.S.; Bahadori, T.; Buckley, T.J. et al. |volume=169 |issue=2 |pages=317–32 |year=2019 |doi=10.1093/toxsci/kfz058 |pmid=30835285 |pmc=PMC6542711}}</ref>, [[data mining]] to identify relevant end-points for building predictive models for adverse toxicological outcomes<ref name="SailiSystems19">{{cite journal |title=Systems Modeling of Developmental Vascular Toxicity |journal=Current Opinion in Toxicology |author=Saili, K.S.; Franzosa, J.A.; Baker, N.C. et al. |volume=15 |issue=1 |pages=55–63 |year=2019 |doi=10.1016/j.cotox.2019.04.004 |pmid=32030360 |pmc=PMC7004230}}</ref>, and application of cutting-edge [[machine learning]] (ML) and [[Artificial intelligence|artificial]] or augmented intelligence (AI) techniques.<ref name="LuechtefeldBig18">{{cite journal |title=Big-data and Machine Learning to Revamp Computational Toxicology and Its Use in Risk Assessment |journal=Toxicology Research |author=Luechtefeld, T.; Rowlands, C.; Hartung, T. |volume=7 |issue=5 |pages=732-744 |year=2018 |doi=10.1039/c8tx00051d |pmid=30310652 |pmc=PMC6116175}}</ref> Collectively, these technologies facilitate enhanced mechanistic insights and may obviate the need for inefficient testing in animal models, but they are still not considered mainstream approaches nor are they widely accepted by regulatory agencies.
Individual research programs generate large data sets, but without centralized coordination, standardized [[reporting]], and common storage structures, the data cannot be effectively combined and used to its full potential. The federal Tox21 research consortium, for example, has to date tested more than 9,000 chemicals to varying degrees in 1,600 assays and demonstrated environmental chemical interactions with critical human and ecologically-relevant targets.<ref name="TiceImprov13">{{cite journal |title=Improving the Human Hazard Characterization of Chemicals: A Tox21 Update |journal=Environmental Health Perspectives |author=Tice, R.R.; Austin, C.P.; Kavlock, R.J. et al. |volume=121 |issue=7 |pages=756–65 |year=2013 |doi=10.1289/ehp.1205784 |pmid=23603828 |pmc=PMC3701992}}</ref> Translational systems approaches are being employed by this and other programs (e.g., Horizon 2020, EUToxRisk, CEFIC LRI, and OpenTox) to produce diverse data streams and predict chemical effects on human health and disease outcomes.<ref name="KleinstreuerPheno14">{{cite journal |title=Phenotypic Screening of the ToxCast Chemical Library to Classify Toxic and Therapeutic Mechanisms |journal=Nature Biotechnology |author=Kleinstreuer, N.C.; Yang, J.; Berg, E.L. et al. |volume=32 |issue=6 |pages=583–91 |year=2014 |doi=10.1038/nbt.2914 |pmid=24837663}}</ref> At the same time, there have been substantial efforts to develop and deploy sensors and satellite systems that yield additional large and complex data sets that provide information about chemical exposures.<ref name="DonsWear17">{{cite journal |title=Wearable Sensors for Personal Monitoring and Estimation of Inhaled Traffic-Related Air Pollution: Evaluation of Methods |journal=Environmental Science and Technology |author=Dons, E.; Laeremans, M.; Orjuela, J.P. et al. |volume=51 |issue=3 |pages=1859-1867 |year=2017 |doi=10.1021/acs.est.6b05782 |pmid=28080048}}</ref><ref name="RingConsensus19">{{cite journal |title=Consensus Modeling of Median Chemical Intake for the U.S. Population Based on Predictions of Exposure Pathways |journal=Environmental Science and Technology |author=Ring, C.L.; Arnot, J.A.; Bennett, D.H. et al. |volume=53 |issue=2 |pages=719–32 |year=2019 |doi=10.1021/acs.est.8b04056 |pmid=30516957 |pmc=PMC6690061}}</ref><ref name="WeichenthalAPict19">{{cite journal |title=A Picture Tells a thousand…exposures: Opportunities and Challenges of Deep Learning Image Analyses in Exposure Science and Environmental Epidemiology |journal=Environmental International |author=Weichenthal, S.; Hatzopoulou, M.; Brauer, M. |volume=122 |pages=3–10 |year=2019 |doi=10.1016/j.envint.2018.11.042 |pmid=30473381}}</ref>





Revision as of 17:37, 1 July 2020

Full article title Bringing big data to bear in environmental public health: Challenges and recommendations
Journal Frontiers in Artificial Intelligence
Author(s) Comess, Saskia; Akbay, Alexia; Vasiliou, Melpomene; Hines, Ronald N.; Joppa, Lucas; Vasiliou, Vasilis; Kleinstreuer, Nicole
Author affiliation(s) Yale University, Symbrosia Inc., U.S. EPA, Microsoft Corporation, National Institute of Environmental Health Sciences
Primary contact Email: vasilis dot vasiliou at yale dot edu and nicole dot kleinstreuer at nih dot gov
Editors Emmert-Streib, Frank
Year published 2020
Volume and issue 3
Page(s) 31
DOI 10.3389/frai.2020.00031
ISSN 2624-8212
Distribution license Creative Commons Attribution 4.0 International
Website https://www.frontiersin.org/articles/10.3389/frai.2020.00031/full
Download https://www.frontiersin.org/articles/10.3389/frai.2020.00031/pdf (PDF)

Abstract

Understanding the role that the environment plays in influencing public health often involves collecting and studying large, complex data sets. There have been a number of private and public efforts to gather sufficient information and confront significant unknowns in the field of environmental public health, yet there is a persistent and largely unmet need for findable, accessible, interoperable, and reusable (FAIR) data. Even when data are readily available, the ability to create, analyze, and draw conclusions from these data using emerging computational tools, such as augmented intelligence, artificial intelligence (AI), and machine learning, requires technical skills not currently implemented on a programmatic level across research hubs and academic institutions. We argue that collaborative efforts in data curation and storage, scientific computing, and training are of paramount importance to empower researchers within environmental sciences and the broader public health community to apply AI approaches and fully realize their potential. Leaders in the field were asked to prioritize challenges in incorporating big data in environmental public health research, including inconsistent implementation of FAIR principles in data collection and sharing; a lack of skilled data scientists and appropriate cyber-infrastructures; and limited understanding, identification, and communication of benefits. These issues are discussed and actionable recommendations are provided.

Keywords: artificial intelligence, public health, machine learning, open data, environmental health sciences, big data

Introduction

Out of the tens of thousands of individual chemicals currently in commerce (and many more mixtures, natural products, and metabolites) less than 10 percent have been screened for safety. The United States Environmental Protection Agency's (EPA's) Toxic Substances Control Act (TSCA) Chemical Substance Inventory contains roughly 85,000 chemicals[1], and the European Chemicals Agency (ECHA) Inventory lists over 100,000 unique substances (as of the most recent update in August 2017), of which approximately 22,000 are registered substances with detailed information on structure, usage, or toxicity.[2] Understanding which chemicals in the environment—both with and without safety data—pose a risk to human health requires that we more effectively leverage the data that we already have, and that we take intelligent approaches to generating new data. While the traditional means of collecting chemical safety data (animal models) are laborious and of variable accuracy and human relevance[3], such reference data can still be used to train models for prioritizing and predicting toxicity of new chemicals, provided the data are curated in a computationally accessible format and, ideally, integrated with other lines of evidence providing mechanistic information. This requires significant effort, both in collecting and extracting information as well as annotating it appropriately.

These toxicological problems are mirrored in public and environmental health more generally as huge, complex issues with inadequately curated data and insufficient analytic power. Recent research in toxicology has focused on high-throughput screening to rapidly produce quantitative data on thousands of human biological targets[4], data mining to identify relevant end-points for building predictive models for adverse toxicological outcomes[5], and application of cutting-edge machine learning (ML) and artificial or augmented intelligence (AI) techniques.[6] Collectively, these technologies facilitate enhanced mechanistic insights and may obviate the need for inefficient testing in animal models, but they are still not considered mainstream approaches nor are they widely accepted by regulatory agencies.

Individual research programs generate large data sets, but without centralized coordination, standardized reporting, and common storage structures, the data cannot be effectively combined and used to its full potential. The federal Tox21 research consortium, for example, has to date tested more than 9,000 chemicals to varying degrees in 1,600 assays and demonstrated environmental chemical interactions with critical human and ecologically-relevant targets.[7] Translational systems approaches are being employed by this and other programs (e.g., Horizon 2020, EUToxRisk, CEFIC LRI, and OpenTox) to produce diverse data streams and predict chemical effects on human health and disease outcomes.[8] At the same time, there have been substantial efforts to develop and deploy sensors and satellite systems that yield additional large and complex data sets that provide information about chemical exposures.[9][10][11]


References

  1. Environmental Protection Agency. "TSCA Chemical Substance Inventory: About the TSCA Chemical Substance Inventory". Environmental Protection Agency. https://www.epa.gov/tsca-inventory/about-tsca-chemical-substance-inventory. Retrieved 04 January 2019. 
  2. European Chemicals Agency. "EC Inventory". European Chemicals Agency. https://echa.europa.eu/information-on-chemicals/ec-inventory. Retrieved 04 January 2019. 
  3. Hartung, T. (2009). "Toxicology for the Twenty-First Century". Nature 460 (7252): 208–12. doi:10.1038/460208a. PMID 19587762. 
  4. Thomas, R.S.; Bahadori, T.; Buckley, T.J. et al. (2019). "The Next Generation Blueprint of Computational Toxicology at the U.S. Environmental Protection Agency". Toxicological Sciences 169 (2): 317–32. doi:10.1093/toxsci/kfz058. PMC PMC6542711. PMID 30835285. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6542711. 
  5. Saili, K.S.; Franzosa, J.A.; Baker, N.C. et al. (2019). "Systems Modeling of Developmental Vascular Toxicity". Current Opinion in Toxicology 15 (1): 55–63. doi:10.1016/j.cotox.2019.04.004. PMC PMC7004230. PMID 32030360. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7004230. 
  6. Luechtefeld, T.; Rowlands, C.; Hartung, T. (2018). "Big-data and Machine Learning to Revamp Computational Toxicology and Its Use in Risk Assessment". Toxicology Research 7 (5): 732-744. doi:10.1039/c8tx00051d. PMC PMC6116175. PMID 30310652. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116175. 
  7. Tice, R.R.; Austin, C.P.; Kavlock, R.J. et al. (2013). "Improving the Human Hazard Characterization of Chemicals: A Tox21 Update". Environmental Health Perspectives 121 (7): 756–65. doi:10.1289/ehp.1205784. PMC PMC3701992. PMID 23603828. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3701992. 
  8. Kleinstreuer, N.C.; Yang, J.; Berg, E.L. et al. (2014). "Phenotypic Screening of the ToxCast Chemical Library to Classify Toxic and Therapeutic Mechanisms". Nature Biotechnology 32 (6): 583–91. doi:10.1038/nbt.2914. PMID 24837663. 
  9. Dons, E.; Laeremans, M.; Orjuela, J.P. et al. (2017). "Wearable Sensors for Personal Monitoring and Estimation of Inhaled Traffic-Related Air Pollution: Evaluation of Methods". Environmental Science and Technology 51 (3): 1859-1867. doi:10.1021/acs.est.6b05782. PMID 28080048. 
  10. Ring, C.L.; Arnot, J.A.; Bennett, D.H. et al. (2019). "Consensus Modeling of Median Chemical Intake for the U.S. Population Based on Predictions of Exposure Pathways". Environmental Science and Technology 53 (2): 719–32. doi:10.1021/acs.est.8b04056. PMC PMC6690061. PMID 30516957. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6690061. 
  11. Weichenthal, S.; Hatzopoulou, M.; Brauer, M. (2019). "A Picture Tells a thousand…exposures: Opportunities and Challenges of Deep Learning Image Analyses in Exposure Science and Environmental Epidemiology". Environmental International 122: 3–10. doi:10.1016/j.envint.2018.11.042. PMID 30473381. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version—by design—lists them in order of appearance.