Difference between revisions of "Genome informatics"

From LIMSWiki
Jump to navigationJump to search
(Added content. Saving and adding more.)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Genome informatics''' is a field of computational molecular biology and branch of [[Informatics (academic field)|informatics]] that uses computers, software, and computational solution techniques to make observations, resolve problems, and manage data related to the genomic function of DNA sequences, comparison of gene structures, determination of the tertiary structure of all proteins, and other molecular biological activities.<ref name="WuNeural">{{cite book |url=https://books.google.com/books?id=NcpGMdbP4BkC&pg=PA3 |title=Neural Networks and Genome Informatics |author=Wu, C. H.; McLarty, J. W. |publisher=Elsevier |volume=1 |year=2012 |pages=1–4 |isbn=9780080537375 |accessdate=14 January 2015}}</ref>
[[File:Genome sequencing costs 2011.jpg|right|thumb|550px|The cost of genome sequencing has drastically decreased thanks to the Human Genome Project and associated pushes to further genome informatics.]]'''Genome informatics''' is a field of computational molecular biology and branch of [[Informatics (academic field)|informatics]] that uses computers, software, and computational solution techniques to make observations, resolve problems, and manage data related to the genomic function of DNA sequences, comparison of gene structures, determination of the tertiary structure of all proteins, and other molecular biological activities.<ref name="WuNeural">{{cite book |url=https://books.google.com/books?id=NcpGMdbP4BkC&pg=PA3 |title=Neural Networks and Genome Informatics |author=Wu, C. H.; McLarty, J. W. |publisher=Elsevier |volume=1 |year=2012 |edition=2nd |pages=1–4 |isbn=9780080537375 |accessdate=14 January 2015}}</ref>


==History==
==History==
A collaboration between the U.S. Department of Energy and the [[National Institutes of Health]] brought the Human Genome Project formally into existence on October 1, 1990. The project sought to identify all human genes and determine the related DNA sequences while also improving storage and analysis computing tools. Only two months later, on December 3–4, 1990, the first annual Genome Informatics Workshop (GIW) was hosted in Tokyo, Japan.<ref name="GIW1">{{cite web |url=http://www.jsbi.org/journal1/gi01/ |title=Genome Informatics Vol. 1 (1990) |work=Genome Informatics |publisher=Japanese Society for Bioinformatics |accessdate=14 January 2015}}</ref> (The name of the event changed with the twelfth meeting in 2001 to the International Conference on Genome Informatics.<ref name="GIW11">{{cite web |url=http://giw.hgc.jp/ |title=GIW International Conference on Genome Informatics |publisher=University of Tokyo |accessdate=14 January 2015}}</ref>) While not the first major discussion about applying informatics to genomic research and data management, the Human Genome Project was arguably one of the biggest catalysts for the initial advancement of genome informatics.<ref name="HGPandInfo">{{cite journal |url=http://www.esp.org/ieee-2.pdf |format=PDF |title=Informatics and the Human Genome Project |journal=IEEE Engineering in Medicine and Biology Magazine |author=Robbins, Robert J.; Benton, David; Snoddy, Jay |volume=14 |issue=6 |pages=694–701 |year=November/December 1995 |doi=10.1109/51.473262 |accessdate=15 January 2015}}</ref> In the early 1990s researchers were faced with many challenges, including the question "Can genome informatics keep up with the technology?" Charles Cantor of the Center for Advanced Biotechnology thought that that technology development itself would not hinder the emerging field of genome informatics, but he saw the interface between human and computers to be problematic, particularly for the Human Genome Project.<ref name="CantorCompMeth">{{cite book |url=https://books.google.com/books?id=xfqwBzmAM_kC&pg=PA1 |title=Computational Methods in Genome Research |chapter=Can Computational Science Keep Up With Evolving Technology for Genome Mapping and Sequencing? |author=Cantor, Charles R.; Suhai, Sándor (Ed.) |publisher=Springer Science & Business Media |pages=227 |year=1994 |isbn=9780306447129 |accessdate=15 January 2015}}</ref> Interest in informatics tools went beyond researching the human genome, however. In June 1994, the Mouse Genome Informatics Group released version 1.0 of the Mouse Genome Database that included "easy-to-use query options and tools for display, analysis, and reporting" of genomic data.<ref name="MDG1">{{cite web |url=http://www.informatics.jax.org/mgihome/other/mgicron.shtml |title=Chronology of MGI Database Releases |publisher=The Jackson Laboratory |date=30 December 2014 |accessdate=15 January 2015}}</ref>
A collaboration between the U.S. Department of Energy and the [[National Institutes of Health]] brought the Human Genome Project formally into existence on October 1, 1990. The project sought to identify all human genes and determine the related DNA sequences while also improving storage and analysis computing tools. Only two months later, on December 3–4, 1990, the first annual Genome Informatics Workshop (GIW) was hosted in Tokyo, Japan.<ref name="GIW1">{{cite web |url=http://www.jsbi.org/journal1/gi01/ |archiveurl=https://web.archive.org/web/20150920193122/http://www.jsbi.org/journal1/gi01/ |title=Genome Informatics Vol. 1 (1990) |work=Genome Informatics |publisher=Japanese Society for Bioinformatics |archivedate=20 September 2015 |accessdate=06 January 2022}}</ref> (The name of the event changed with the twelfth meeting in 2001 to the International Conference on Genome Informatics.<ref name="GIW11">{{cite web |url=http://giw.hgc.jp/ |title=GIW International Conference on Genome Informatics |publisher=University of Tokyo |accessdate=14 January 2015}}</ref>) While not the first major discussion about applying informatics to genomic research and data management, the Human Genome Project was arguably one of the biggest catalysts for the initial advancement of genome informatics.<ref name="HGPandInfo">{{cite journal |url=http://www.esp.org/ieee-2.pdf |format=PDF |title=Informatics and the Human Genome Project |journal=IEEE Engineering in Medicine and Biology Magazine |author=Robbins, Robert J.; Benton, David; Snoddy, Jay |volume=14 |issue=6 |pages=694–701 |year=November/December 1995 |doi=10.1109/51.473262 |accessdate=15 January 2015}}</ref> In the early 1990s researchers were faced with many challenges, including the question "Can genome informatics keep up with the technology?" Charles Cantor of the Center for Advanced Biotechnology thought that that technology development itself would not hinder the emerging field of genome informatics, but he saw the interface between human and computers to be problematic, particularly for the Human Genome Project.<ref name="CantorCompMeth">{{cite book |url=https://books.google.com/books?id=xfqwBzmAM_kC&pg=PA1 |title=Computational Methods in Genome Research |chapter=Can Computational Science Keep Up With Evolving Technology for Genome Mapping and Sequencing? |author=Cantor, Charles R.; Suhai, Sándor (Ed.) |publisher=Springer Science & Business Media |pages=227 |year=1994 |isbn=9780306447129 |accessdate=15 January 2015}}</ref> Interest in informatics tools went beyond researching the human genome, however. In June 1994, the Mouse Genome Informatics Group released version 1.0 of the Mouse Genome Database that included "easy-to-use query options and tools for display, analysis, and reporting" of genomic data.<ref name="MDG1">{{cite web |url=http://www.informatics.jax.org/mgihome/other/mgicron.shtml |title=Chronology of MGI Database Releases |publisher=The Jackson Laboratory |date=30 December 2014 |accessdate=15 January 2015}}</ref>
 
As genomic and proteomic informatics tools and technologies continued to advance from 1995 to 2005, the costs associated with DNA sequencing decreased fifty-fold; advances in technology were expected to improve analysis, design, and system integration and reduce the cost even further.<ref name="GilAuto">{{cite book |url=https://books.google.com/books?id=OEYHLzTsEtwC&pg=PR9 |title=Automation in Proteomics and Genomics: An Engineering Case-Based Approach |chapter=Preface |author=Alterovitz, Gil; Benson, Roseann; Ramoni, Marco F. |publisher=John Wiley & Sons |year=2009 |pages=ix–xi |isbn=9780470741177 |accessdate=15 January 2015}}</ref> Those cost benefits were realized into 2015, with primary challenges shifting to "organizing this data, maintaining it in a way that is accessible and easy to use for researchers around the world, 24 hours a day."<ref name="FlurryBD">{{cite web |url=https://news.uga.edu/upenn-uga-23-million-pathogen-genomics-database-contract-1214/ |title=Building on big data, UPenn and UGA awarded $23.4 million pathogen genomics database contract |author=Flurry, Alan |publisher=University of Georgia Office of Public Affairs |work=UGA Today |date=18 December 2014 |accessdate=06 January 2022}}</ref>
 
Technology had made genomics and proteomics analysis so accessible that term "big data" began being used in relation to it and other types of data management in the 2010s.<ref name="FlurryBD" /><ref name="BresnickBD">{{cite web |url=https://healthitanalytics.com/news/big-data-analytics-research-projects-target-cancer-genomics/ |title=Big Data Analytics Research Projects Target Cancer, Genomics |author=Bresnick, Jennifer |publisher=Xtelligent Media, LLC |work=Health IT Analytics |date=09 January 2015 |accessdate=06 January 2022}}</ref> In January 2015, IBM was reportedly helping molecular profiling company Caris Life Sciences make sense of its genomics data. The company was generating "more data per patient through its genomic sequencing than any other lab in the United States — with more than half a terabyte of information being generated on a daily basis for individual patient samples."<ref name="eWeekCaris">{{cite web |url=https://www.eweek.com/database/how-ibm-teams-up-with-partners-to-address-the-data-deluge/ |title=How IBM Teams Up With Partners to Address the Data Deluge |author=Taft, Darryl K. |publisher=QuinStreet, Inc |work=eWeek |date=09 January 2015 |accessdate=06 January 2022}}</ref>
 
Future genome informatics concerns will likely include taking genomic data analysis to phenotyping to patient care and considering the ethics of genomic data collection, storage, and analysis.<ref name="LoricaBD">{{cite web |url=http://radar.oreilly.com/2015/01/a-brief-look-at-data-sciences-past-and-future.html |title=A brief look at data science’s past and future |author=Lorica, Ben |publisher=O'Reilly Media, Inc |work=O'Reilly Radar |date=15 January 2015 |accessdate=15 January 2015}}</ref>


As genomic and proteomic informatics tools and technologies continued to advance from 1995 to 2005, the costs associated with DNA sequencing decreased fifty-fold; advances in technology were expected to improve analysis, design, and system integration and reduce the cost even further.<ref name="GilAuto">{{cite book |url=https://books.google.com/books?id=OEYHLzTsEtwC&pg=PR9 |title=Automation in Proteomics and Genomics: An Engineering Case-Based Approach |chapter=Preface |author=Alterovitz, Gil; Benson, Roseann; Ramoni, Marco F. |publisher=John Wiley & Sons |year=2009 |pages=ix–xi |isbn=9780470741177 |accessdate=15 January 2015}}</ref>
==Application==
==Application==
Genome informatics can help tackle problems and tasks such as the following<ref name="WuNeural" />:
Genome informatics can help tackle problems and tasks such as the following<ref name="WuNeural" />:
Line 13: Line 18:
* extracting information from "families of homologous sequences and their structures"
* extracting information from "families of homologous sequences and their structures"
* detecting and classifying near and distant family relations of genes
* detecting and classifying near and distant family relations of genes
* molecular profiling


==Informatics==
==Informatics==
The informatics side of genomics has largely focused on analytical tools and methodologies. DNA-microarray and sequencing technology helped researchers for the Human Genome Project analyze and understand thousands of genes and their expressions. By 2000, artificial neural networks were being theorized as a possible informatics tools to aid with data analysis and the problem of "high dimensionality" of the outputted data; by 2014 artificial neural networks were being proposed for cancer genomic research.<ref name="WuNeural" /><ref name="Oustimov">{{cite journal |title=Artificial neural networks in the cancer genomics frontier |journal=Translational Cancer Research |author=Oustimov, Andrew; Vu, Vincent |volume=3 |issue=3 |year=June 2014 |pages=191–201 |doi=10.3978/j.issn.2218-676X.2014.05.01}}</ref>


Aside from creating better algorithms, sequencing tools, and analysis tools, the informatics side of genomics research also involves the development and implementation of public and private genomics databases, which often include data display, analysis, and reporting tools to apply to the contained data. These databases can range in size from small, single-purpose data repositories to multi-terabyte, multi-server installations accessed by tens of thousands of people a month.<ref name="FlurryBD" />


==Further reading==
==Further reading==
 
* {{cite book |url=https://books.google.com/books?id=NcpGMdbP4BkC&pg=PA3 |title=Neural Networks and Genome Informatics |author=Wu, C. H.; McLarty, J. W. |publisher=Elsevier |volume=1 |edition=2nd |year=2012 |pages=220 |isbn=9780080537375}}
* {{cite book |url=https://books.google.com/books?id=NcpGMdbP4BkC&pg=PA3 |title=Neural Networks and Genome Informatics |author=Wu, C. H.; McLarty, J. W. |publisher=Elsevier |volume=1 |year=2012 |pages=220 |isbn=9780080537375}}
 
==External links==
==External links==
===Conferences===
===Conferences===
* [http://meetings.cshl.edu/meetings/2015/info15.shtml Cold Spring Harbor Laboratory Conference on Genome Informatics] (U.S.)
* [https://meetings.cshl.edu/meetings.aspx?meet=info&year=21 Cold Spring Harbor Laboratory Conference on Genome Informatics] (U.S.)
* [https://registration.hinxton.wellcome.ac.uk/display_info.asp?id=406 Genome Informatics Conference] (U.K.)
* [https://coursesandconferences.wellcomeconnectingscience.org/our-events/conferences/ Genome Informatics Conference] (U.K.)
* [http://www.jsbi.org/giw2014/ International Conference on Genome Informatics] (Japan)
* [https://www.jsbi.org/en/giw/ International Conference on Genome Informatics] (Japan)
* [http://mysrm.srmuniv.ac.in/ncgi/ National Conference on Genome Informatics] (India)


===Databases===
===Databases===
* [http://www.informatics.jax.org/ Mouse Genome Informatics database]
* [https://veupathdb.org/veupathdb/app VEuPathDB]
* [http://www.informatics.jax.org/ Mouse Genome Informatics]
* A list of global [https://www.hsls.pitt.edu/obrc/index.php?page=genomics genomics databases and analysis tools] can be found hosted by the Health Sciences Library System, University of Pittsburgh.


==References==
==References==
<references/>
{{Reflist|colwidth=30em}}


<!---Place all category tags here-->
<!---Place all category tags here-->
[[Category:Genomics]]
[[Category:Informatics]]
[[Category:Informatics]]

Latest revision as of 15:05, 20 September 2022

The cost of genome sequencing has drastically decreased thanks to the Human Genome Project and associated pushes to further genome informatics.

Genome informatics is a field of computational molecular biology and branch of informatics that uses computers, software, and computational solution techniques to make observations, resolve problems, and manage data related to the genomic function of DNA sequences, comparison of gene structures, determination of the tertiary structure of all proteins, and other molecular biological activities.[1]

History

A collaboration between the U.S. Department of Energy and the National Institutes of Health brought the Human Genome Project formally into existence on October 1, 1990. The project sought to identify all human genes and determine the related DNA sequences while also improving storage and analysis computing tools. Only two months later, on December 3–4, 1990, the first annual Genome Informatics Workshop (GIW) was hosted in Tokyo, Japan.[2] (The name of the event changed with the twelfth meeting in 2001 to the International Conference on Genome Informatics.[3]) While not the first major discussion about applying informatics to genomic research and data management, the Human Genome Project was arguably one of the biggest catalysts for the initial advancement of genome informatics.[4] In the early 1990s researchers were faced with many challenges, including the question "Can genome informatics keep up with the technology?" Charles Cantor of the Center for Advanced Biotechnology thought that that technology development itself would not hinder the emerging field of genome informatics, but he saw the interface between human and computers to be problematic, particularly for the Human Genome Project.[5] Interest in informatics tools went beyond researching the human genome, however. In June 1994, the Mouse Genome Informatics Group released version 1.0 of the Mouse Genome Database that included "easy-to-use query options and tools for display, analysis, and reporting" of genomic data.[6]

As genomic and proteomic informatics tools and technologies continued to advance from 1995 to 2005, the costs associated with DNA sequencing decreased fifty-fold; advances in technology were expected to improve analysis, design, and system integration and reduce the cost even further.[7] Those cost benefits were realized into 2015, with primary challenges shifting to "organizing this data, maintaining it in a way that is accessible and easy to use for researchers around the world, 24 hours a day."[8]

Technology had made genomics and proteomics analysis so accessible that term "big data" began being used in relation to it and other types of data management in the 2010s.[8][9] In January 2015, IBM was reportedly helping molecular profiling company Caris Life Sciences make sense of its genomics data. The company was generating "more data per patient through its genomic sequencing than any other lab in the United States — with more than half a terabyte of information being generated on a daily basis for individual patient samples."[10]

Future genome informatics concerns will likely include taking genomic data analysis to phenotyping to patient care and considering the ethics of genomic data collection, storage, and analysis.[11]

Application

Genome informatics can help tackle problems and tasks such as the following[1]:

  • analyzing DNA sequences
  • recognizing genes and proteins and predicting their structures
  • predicting the biochemical function of new genes or fragments
  • extracting information from "families of homologous sequences and their structures"
  • detecting and classifying near and distant family relations of genes
  • molecular profiling

Informatics

The informatics side of genomics has largely focused on analytical tools and methodologies. DNA-microarray and sequencing technology helped researchers for the Human Genome Project analyze and understand thousands of genes and their expressions. By 2000, artificial neural networks were being theorized as a possible informatics tools to aid with data analysis and the problem of "high dimensionality" of the outputted data; by 2014 artificial neural networks were being proposed for cancer genomic research.[1][12]

Aside from creating better algorithms, sequencing tools, and analysis tools, the informatics side of genomics research also involves the development and implementation of public and private genomics databases, which often include data display, analysis, and reporting tools to apply to the contained data. These databases can range in size from small, single-purpose data repositories to multi-terabyte, multi-server installations accessed by tens of thousands of people a month.[8]

Further reading

External links

Conferences

Databases

References

  1. 1.0 1.1 1.2 Wu, C. H.; McLarty, J. W. (2012). Neural Networks and Genome Informatics. 1 (2nd ed.). Elsevier. pp. 1–4. ISBN 9780080537375. https://books.google.com/books?id=NcpGMdbP4BkC&pg=PA3. Retrieved 14 January 2015. 
  2. "Genome Informatics Vol. 1 (1990)". Genome Informatics. Japanese Society for Bioinformatics. Archived from the original on 20 September 2015. https://web.archive.org/web/20150920193122/http://www.jsbi.org/journal1/gi01/. Retrieved 06 January 2022. 
  3. "GIW International Conference on Genome Informatics". University of Tokyo. http://giw.hgc.jp/. Retrieved 14 January 2015. 
  4. Robbins, Robert J.; Benton, David; Snoddy, Jay (November/December 1995). "Informatics and the Human Genome Project" (PDF). IEEE Engineering in Medicine and Biology Magazine 14 (6): 694–701. doi:10.1109/51.473262. http://www.esp.org/ieee-2.pdf. Retrieved 15 January 2015. 
  5. Cantor, Charles R.; Suhai, Sándor (Ed.) (1994). "Can Computational Science Keep Up With Evolving Technology for Genome Mapping and Sequencing?". Computational Methods in Genome Research. Springer Science & Business Media. pp. 227. ISBN 9780306447129. https://books.google.com/books?id=xfqwBzmAM_kC&pg=PA1. Retrieved 15 January 2015. 
  6. "Chronology of MGI Database Releases". The Jackson Laboratory. 30 December 2014. http://www.informatics.jax.org/mgihome/other/mgicron.shtml. Retrieved 15 January 2015. 
  7. Alterovitz, Gil; Benson, Roseann; Ramoni, Marco F. (2009). "Preface". Automation in Proteomics and Genomics: An Engineering Case-Based Approach. John Wiley & Sons. pp. ix–xi. ISBN 9780470741177. https://books.google.com/books?id=OEYHLzTsEtwC&pg=PR9. Retrieved 15 January 2015. 
  8. 8.0 8.1 8.2 Flurry, Alan (18 December 2014). "Building on big data, UPenn and UGA awarded $23.4 million pathogen genomics database contract". UGA Today. University of Georgia Office of Public Affairs. https://news.uga.edu/upenn-uga-23-million-pathogen-genomics-database-contract-1214/. Retrieved 06 January 2022. 
  9. Bresnick, Jennifer (9 January 2015). "Big Data Analytics Research Projects Target Cancer, Genomics". Health IT Analytics. Xtelligent Media, LLC. https://healthitanalytics.com/news/big-data-analytics-research-projects-target-cancer-genomics/. Retrieved 06 January 2022. 
  10. Taft, Darryl K. (9 January 2015). "How IBM Teams Up With Partners to Address the Data Deluge". eWeek. QuinStreet, Inc. https://www.eweek.com/database/how-ibm-teams-up-with-partners-to-address-the-data-deluge/. Retrieved 06 January 2022. 
  11. Lorica, Ben (15 January 2015). "A brief look at data science’s past and future". O'Reilly Radar. O'Reilly Media, Inc. http://radar.oreilly.com/2015/01/a-brief-look-at-data-sciences-past-and-future.html. Retrieved 15 January 2015. 
  12. Oustimov, Andrew; Vu, Vincent (June 2014). "Artificial neural networks in the cancer genomics frontier". Translational Cancer Research 3 (3): 191–201. doi:10.3978/j.issn.2218-676X.2014.05.01.