Journal:Building infrastructure for African human genomic data management

From LIMSWiki
Revision as of 20:50, 11 November 2019 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Building infrastructure for African human genomic data management
Journal Data Science Journal
Author(s) Parker, Ziyaad; Maslamoney, Suresh; Meintjes, Ayton; Botha, Gerrit; Panji, Sumir; Hazelhurst, Scott; Mulder, Nicola
Author affiliation(s) University of Cape Town, University of the Witwatersrand
Primary contact Email: ziyaad dot parker at uct dot ac dot za
Year published 2019
Volume and issue 18(1)
Page(s) 47
DOI 10.5334/dsj-2019-047
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2019-047/
Download https://datascience.codata.org/articles/10.5334/dsj-2019-047/galley/894/download/ (PDF)

Abstract

Human genomic data are large and complex, and require adequate infrastructure for secure storage and transfer. The National Institutes of Health (NIH) and The Wellcome Trust have funded multiple projects on genomic research, including the Human Heredity and Health in Africa (H3Africa) initiative, and data are required to be deposited into the public domain. The European Genome-phenome Archive (EGA) is a repository for sequence and genotype data where data access is controlled by access committees. Access is determined by a formal application procedure for the purpose of secure storage and distribution, which must be in line with the informed consent of the study participants. H3Africa researchers based in Africa and generating their own data can benefit tremendously from the data sharing capabilities of the internet by using the appropriate technologies. The H3Africa Data Archive is an effort between the H3Africa data generating projects, H3ABioNet, and the EGA to store and submit genomic data to public repositories. H3ABioNet maintains the security of the H3Africa Data Archive, ensures ethical security compliance, supports users with data submission, and facilitates data transfers. The goal is to ensure efficient data flow between researchers, the archive, and the EGA or other public repositories. To comply with the H3Africa data sharing and release policy, nine months after the data is in secure storage, H3ABioNet converts the data into an Extensible Markup Language (XML) format ready for submission to EGA. This article describes the infrastructure that has been developed for African human genomic data management.

Keywords: genomic data, data archive, H3Africa data, African genomic data

Introduction

Advances in high-throughput genomic technologies are laying the foundations for the goal of precision medicine to be realized.[1][2] Decreasing costs and the capacity to generate larger volumes of human genomic data at faster rates are enabling population-level genomics studies to be conducted.[3][4] However, most of the current population-level genomics studies and data generated to date have a significant population representational bias, with the majority of genome sequences being derived from European and North American ancestry, regions that have been early adopters of genomic technologies.[4][5] African researchers, in general, have been late adopters of high-throughput technologies for use in population genomics due to more limited resources and funding. To address this critical gap in scientific knowledge about African genomics and population variation, and inspired by the African Society for Human Genetics, the National Institutes of Health (NIH) and The Wellcome Trust, through the Human Hereditary and Health in Africa (H3Africa) program, have funded multiple genomics projects led by African investigators.[6][7] To support the H3Africa projects in terms of provisioning of infrastructure for secure data storage, management, and computing, the NIH has also funded a Pan-African Bioinformatics Network for H3Africa (H3ABioNet).[8]

The H3Africa Consortium consists of multiple projects and sites distributed across Africa, most of which are generating genomic data linked to clinical data for specific diseases. The principal H3Africa funders (NIH and the Wellcome Trust) require any project data generated to be deposited into a data repository accessible by the scientific community.[9][10] In order to facilitate the storage and accessibility of H3Africa genomics data, significant infrastructure, procedures, and policies were established. Part of H3ABioNet’s mandate is to develop processes and implement an infrastructure that will enable the ingestion, validation, annotation, secure storage, and submission of the African genomics data to the controlled access European Genome-phenome Archive (EGA).[11] This has been achieved through the development of the H3Africa Data Archive, which also ensures a copy of the genomic data is securely stored and retained on the African continent.[8][12] This article describes the infrastructure that has been developed, which to our knowledge, is the first formalized human genomic data archive on the continent.

References

  1. Christensen, K.D.; Dukhovny, D.; Siebert, U. et al. (2015). "Assessing the Costs and Cost-Effectiveness of Genomic Sequencing". Journal of Personalized Medicine 5 (4): 470–86. doi:10.3390/jpm5040470. PMC PMC4695866. PMID 26690481. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4695866. 
  2. Aronson, S.J.; Rehm, H.L. (2015). "Building the foundation for genomics in precision medicine". Nature 526 (7573): 336–42. doi:10.1038/nature15816. PMC PMC5669797. PMID 26469044. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5669797. 
  3. Goldfeder, R.L.; Wall, D.P.; Khoury, M.J. et al. (2017). "Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis". American Journal of Epidemiology 186 (8): 1000–1009. doi:10.1093/aje/kww224. PMC PMC6250075. PMID 29040395. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6250075. 
  4. 4.0 4.1 Prokop. J.W.; May, T.; Strong, K. et al. (2018). "Genome sequencing in the clinic: the past, present, and future of genomic medicine". Physiological Genomics 50 (8): 563–79. doi:10.1152/physiolgenomics.00046.2018. PMC PMC6139636. PMID 29727589. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139636. 
  5. Popejoy, A.B.; Fullerton, S.M. (2016). "Genomics is failing on diversity". Nature 538 (7624): 161–64. doi:10.1038/538161a. PMC PMC5089703. PMID 27734877. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5089703. 
  6. H3Africa Consortium; Rotimi, C.; Abayomi, A. et al. (2014). "Research capacity. Enabling the genomic revolution in Africa". Science 344 (6190): 1346–8. doi:10.1126/science.1251546. PMC PMC4138491. PMID 24948725. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138491. 
  7. Mulder, N.; Abimiku, A.; Adebamowo, S.N. et al. (2018). "H3Africa: Current perspectives". Pharmacogenomics and Personalized Medicine 11: 59–86. doi:10.2147/PGPM.S141546. PMC PMC5903476. PMID 29692621. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903476. 
  8. 8.0 8.1 Mulder, N.J.; Adebiyi, E.; Alami, R. et al. (2016). "H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa". Genome Research 26 (2): 271–7. doi:10.1101/gr.196295.115. PMC PMC4728379. PMID 26627985. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4728379. 
  9. "NIH Sharing Policies and Related Guidance on NIH-Funded Research Resources". Grants & Funding. National Institutes of Health. 2019. https://grants.nih.gov/policy/sharing.htm. 
  10. "Data, software and materials management and sharing policy". Funding. The Wellcome Trust. 10 July 2017. https://wellcome.ac.uk/funding/guidance/data-software-materials-management-and-sharing-policy. 
  11. Lappalainen, I.; Almeida-King, J.; Kumanduri, V. et al. (2015). "The European Genome-phenome Archive of human data consented for biomedical research". Nature Genetics 47 (7): 692–5. doi:10.1038/ng.3312. PMC PMC5426533. PMID 26111507. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5426533. 
  12. Mulder, N.J.; Adebiyi, E.; Adebiyi, M. et al. (2017). "Development of Bioinformatics Infrastructure for Genomics Research". Global Heart 12 (2): 91–98. doi:10.1016/j.gheart.2017.01.005. PMC PMC5582980. PMID 28302555. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5582980. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original paper listed references alphabetically; this wiki lists them by order of appearance, by design. The two footnotes were turned into inline references for convenience.