Difference between revisions of "Journal:DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and editing now.)
 
(Saving and adding more.)
Line 32: Line 32:


'''Keywords''': Clustering, ChIP-seq, epigenetics, dynamic time warping
'''Keywords''': Clustering, ChIP-seq, epigenetics, dynamic time warping
==Background==
Sequencing-based technologies such as ChIP-Seq and DNAse-Seq [e.g., reviewed in Furey 2012<ref name="FureyChip12">{{cite journal |title=Chip-seq and beyond: New and improved methodologies to detect and characterize protein-DNA interactions |journal=Nature Reviews Genetics |author=Furey, T.S. |volume=13 |issue=12 |pages=840-52 |year=2012 |doi=10.1038/nrg3306 |pmid=23090257 |pmc=PMC3591838}}</ref>] have revolutionized our understanding of chromatin structure and function, yielding deep insights in the importance of epigenomic marks in the basic processes of life. The emergent picture is that gene expression is controlled by a complex interplay of protein binding and epigenomic modifications. While histone marks (and other epigenomic marks) can be measured in a high-throughput way, exploratory data analysis techniques for these data types are still being developed. Epigenomic marks exhibit characteristics that distinguish them fundamentally from e.g., mRNA gene expression measurements: they are spatially extended across regions as wide as several kilobases within which they often present interesting local structures, such as the presence of multiple peaks and troughs<ref name="BarskiHigh07">{{cite journal |title=High-resolution profiling of histone methylations in the human genome |journal=Cell |author=Barski, A.; Cuddapah, S.; Cui, K. et al. |volume=129 |issue=4 |pages=823-37 |year=2007 |doi=10.1016/j.cell.2007.05.009 |pmid=17512414}}</ref>, and intriguing asymmetries<ref name="KundajeUbiq12">{{cite journal |title=Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements |journal=Genome Research |author=Kundaje, A.; Kyriazopoulou-Panagiotopoulou, S.; Libbrecht, M. et al. |volume=22 |issue=9 |pages=1735-47 |year=2012 |doi=10.1101/gr.136366.111 |pmid=22955985 |pmc=PMC3431490}}</ref> (see Fig. 1).
The shape of epigenomic marks across replicate data sets appears to be highly conserved, and has recently been exploited for statistical testing [4]. While the biological reasons for such conservation are not entirely clear, recent studies have suggested that both architectural and regulatory aspects may be at play. Bieberstein and colleagues showed intriguing patterns of accumulation of the histone marks H3K4me3 and H3K9ac at splice sites [5], hinting at an architectural origin of the shape of the marks. More recently, Benveniste et al showed that histone marks can be very well predicted genome-wide by the binding patterns of transcription factors (TFs) [6]. The shape of the peak may therefore be a readout of additional chromatin-related events and genomic regions which are similarly marked may therefore hint at common regulatory or architectural features. Excellent visualisation tools (e.g. UCSC genome browser) enable researchers to appreciate such features for individual enrichment peaks. However, while automatically grouping such marks based on shape similarity may be a valuable tool for hypothesis generation, it has remained a non-trivial task.


==References==
==References==

Revision as of 19:10, 30 January 2017

Full article title DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks
Journal BMC Bioinformatics
Author(s) Lukauskas, Saulius; Visintainer, Roberto; Sanguinetti, Guido; Schweikert, Gabriele B.
Author affiliation(s) Imperial College London, Fondazione Bruno Kessler, University of Edinburgh
Primary contact Email: saulius dot lukauskas13 at imperial dot ac dot uk
Year published 2016
Volume and issue 17(Suppl 16)
Page(s) 447
DOI 10.1186/s12859-016-1306-0
ISSN 1471-2105
Distribution license Creative Commons Attribution 4.0 International
Website http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1306-0
Download http://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1306-0 (PDF)

Abstract

Background: Functional genomic and epigenomic research relies fundamentally on sequencing-based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high-dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent.

Results: We present DGW, an open-source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses dynamic time warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project.

Conclusions: Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open-source Python package.

Keywords: Clustering, ChIP-seq, epigenetics, dynamic time warping

Background

Sequencing-based technologies such as ChIP-Seq and DNAse-Seq [e.g., reviewed in Furey 2012[1]] have revolutionized our understanding of chromatin structure and function, yielding deep insights in the importance of epigenomic marks in the basic processes of life. The emergent picture is that gene expression is controlled by a complex interplay of protein binding and epigenomic modifications. While histone marks (and other epigenomic marks) can be measured in a high-throughput way, exploratory data analysis techniques for these data types are still being developed. Epigenomic marks exhibit characteristics that distinguish them fundamentally from e.g., mRNA gene expression measurements: they are spatially extended across regions as wide as several kilobases within which they often present interesting local structures, such as the presence of multiple peaks and troughs[2], and intriguing asymmetries[3] (see Fig. 1).

The shape of epigenomic marks across replicate data sets appears to be highly conserved, and has recently been exploited for statistical testing [4]. While the biological reasons for such conservation are not entirely clear, recent studies have suggested that both architectural and regulatory aspects may be at play. Bieberstein and colleagues showed intriguing patterns of accumulation of the histone marks H3K4me3 and H3K9ac at splice sites [5], hinting at an architectural origin of the shape of the marks. More recently, Benveniste et al showed that histone marks can be very well predicted genome-wide by the binding patterns of transcription factors (TFs) [6]. The shape of the peak may therefore be a readout of additional chromatin-related events and genomic regions which are similarly marked may therefore hint at common regulatory or architectural features. Excellent visualisation tools (e.g. UCSC genome browser) enable researchers to appreciate such features for individual enrichment peaks. However, while automatically grouping such marks based on shape similarity may be a valuable tool for hypothesis generation, it has remained a non-trivial task.

References

  1. Furey, T.S. (2012). "Chip-seq and beyond: New and improved methodologies to detect and characterize protein-DNA interactions". Nature Reviews Genetics 13 (12): 840-52. doi:10.1038/nrg3306. PMC PMC3591838. PMID 23090257. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591838. 
  2. Barski, A.; Cuddapah, S.; Cui, K. et al. (2007). "High-resolution profiling of histone methylations in the human genome". Cell 129 (4): 823-37. doi:10.1016/j.cell.2007.05.009. PMID 17512414. 
  3. Kundaje, A.; Kyriazopoulou-Panagiotopoulou, S.; Libbrecht, M. et al. (2012). "Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements". Genome Research 22 (9): 1735-47. doi:10.1101/gr.136366.111. PMC PMC3431490. PMID 22955985. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431490. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.