Genome informatics is a field of computational molecular biology and branch of informatics that uses computers, software, and computational solution techniques to make observations, resolve problems, and manage data related to the genomic function of DNA sequences, comparison of gene structures, determination of the tertiary structure of all proteins, and other molecular biological activities.
A collaboration between the U.S. Department of Energy and the National Institutes of Health brought the Human Genome Project formally into existence on October 1, 1990. The project sought to identify all human genes and determine the related DNA sequences while also improving storage and analysis computing tools. Only two months later, on December 3–4, 1990, the first annual Genome Informatics Workshop (GIW) was hosted in Tokyo, Japan. (The name of the event changed with the twelfth meeting in 2001 to the International Conference on Genome Informatics.) While not the first major discussion about applying informatics to genomic research and data management, the Human Genome Project was arguably one of the biggest catalysts for the initial advancement of genome informatics. In the early 1990s researchers were faced with many challenges, including the question "Can genome informatics keep up with the technology?" Charles Cantor of the Center for Advanced Biotechnology thought that that technology development itself would not hinder the emerging field of genome informatics, but he saw the interface between human and computers to be problematic, particularly for the Human Genome Project. Interest in informatics tools went beyond researching the human genome, however. In June 1994, the Mouse Genome Informatics Group released version 1.0 of the Mouse Genome Database that included "easy-to-use query options and tools for display, analysis, and reporting" of genomic data.
As genomic and proteomic informatics tools and technologies continued to advance from 1995 to 2005, the costs associated with DNA sequencing decreased fifty-fold; advances in technology were expected to improve analysis, design, and system integration and reduce the cost even further. Those cost benefits were realized into 2015, with primary challenges shifting to "organizing this data, maintaining it in a way that is accessible and easy to use for researchers around the world, 24 hours a day."
Technology had made genomics and proteomics analysis so accessible that term "big data" began being used in relation to it and other types of data management in the 2010s. In January 2015, IBM was reportedly helping molecular profiling company Caris Life Sciences make sense of its genomics data. The company was generating "more data per patient through its genomic sequencing than any other lab in the United States — with more than half a terabyte of information being generated on a daily basis for individual patient samples."
Future genome informatics concerns will likely include taking genomic data analysis to phenotyping to patient care and considering the ethics of genomic data collection, storage, and analysis.
Genome informatics can help tackle problems and tasks such as the following:
- analyzing DNA sequences
- recognizing genes and proteins and predicting their structures
- predicting the biochemical function of new genes or fragments
- extracting information from "families of homologous sequences and their structures"
- detecting and classifying near and distant family relations of genes
- molecular profiling
The informatics side of genomics has largely focused on analytical tools and methodologies. DNA-microarray and sequencing technology helped researchers for the Human Genome Project analyze and understand thousands of genes and their expressions. By 2000, artificial neural networks were being theorized as a possible informatics tools to aid with data analysis and the problem of "high dimensionality" of the outputted data; by 2014 artificial neural networks were being proposed for cancer genomic research.
Aside from creating better algorithms, sequencing tools, and analysis tools, the informatics side of genomics research also involves the development and implementation of public and private genomics databases, which often include data display, analysis, and reporting tools to apply to the contained data. These databases can range in size from small, single-purpose data repositories to multi-terabyte, multi-server installations accessed by tens of thousands of people a month.
- Wu, C. H.; McLarty, J. W. (2012). Neural Networks and Genome Informatics. 1 (2nd ed.). Elsevier. pp. 220. ISBN 9780080537375. https://books.google.com/books?id=NcpGMdbP4BkC&pg=PA3.
- Cold Spring Harbor Laboratory Conference on Genome Informatics (U.S.)
- Genome Informatics Conference (U.K.)
- International Conference on Genome Informatics (Japan)
- Mouse Genome Informatics
- A list of global genomics databases and analysis tools can be found hosted by the Health Sciences Library System, University of Pittsburgh.
- Wu, C. H.; McLarty, J. W. (2012). Neural Networks and Genome Informatics. 1 (2nd ed.). Elsevier. pp. 1–4. ISBN 9780080537375. https://books.google.com/books?id=NcpGMdbP4BkC&pg=PA3. Retrieved 14 January 2015.
- "Genome Informatics Vol. 1 (1990)". Genome Informatics. Japanese Society for Bioinformatics. Archived from the original on 20 September 2015. https://web.archive.org/web/20150920193122/http://www.jsbi.org/journal1/gi01/. Retrieved 06 January 2022.
- "GIW International Conference on Genome Informatics". University of Tokyo. http://giw.hgc.jp/. Retrieved 14 January 2015.
- Robbins, Robert J.; Benton, David; Snoddy, Jay (November/December 1995). "Informatics and the Human Genome Project" (PDF). IEEE Engineering in Medicine and Biology Magazine 14 (6): 694–701. doi:10.1109/51.473262. http://www.esp.org/ieee-2.pdf. Retrieved 15 January 2015.
- Cantor, Charles R.; Suhai, Sándor (Ed.) (1994). "Can Computational Science Keep Up With Evolving Technology for Genome Mapping and Sequencing?". Computational Methods in Genome Research. Springer Science & Business Media. pp. 227. ISBN 9780306447129. https://books.google.com/books?id=xfqwBzmAM_kC&pg=PA1. Retrieved 15 January 2015.
- "Chronology of MGI Database Releases". The Jackson Laboratory. 30 December 2014. http://www.informatics.jax.org/mgihome/other/mgicron.shtml. Retrieved 15 January 2015.
- Alterovitz, Gil; Benson, Roseann; Ramoni, Marco F. (2009). "Preface". Automation in Proteomics and Genomics: An Engineering Case-Based Approach. John Wiley & Sons. pp. ix–xi. ISBN 9780470741177. https://books.google.com/books?id=OEYHLzTsEtwC&pg=PR9. Retrieved 15 January 2015.
- Flurry, Alan (18 December 2014). "Building on big data, UPenn and UGA awarded $23.4 million pathogen genomics database contract". UGA Today. University of Georgia Office of Public Affairs. https://news.uga.edu/upenn-uga-23-million-pathogen-genomics-database-contract-1214/. Retrieved 06 January 2022.
- Bresnick, Jennifer (9 January 2015). "Big Data Analytics Research Projects Target Cancer, Genomics". Health IT Analytics. Xtelligent Media, LLC. https://healthitanalytics.com/news/big-data-analytics-research-projects-target-cancer-genomics/. Retrieved 06 January 2022.
- Taft, Darryl K. (9 January 2015). "How IBM Teams Up With Partners to Address the Data Deluge". eWeek. QuinStreet, Inc. https://www.eweek.com/database/how-ibm-teams-up-with-partners-to-address-the-data-deluge/. Retrieved 06 January 2022.
- Lorica, Ben (15 January 2015). "A brief look at data science’s past and future". O'Reilly Radar. O'Reilly Media, Inc. http://radar.oreilly.com/2015/01/a-brief-look-at-data-sciences-past-and-future.html. Retrieved 15 January 2015.
- Oustimov, Andrew; Vu, Vincent (June 2014). "Artificial neural networks in the cancer genomics frontier". Translational Cancer Research 3 (3): 191–201. doi:10.3978/j.issn.2218-676X.2014.05.01.