Difference between revisions of "Journal:DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks"
Shawndouglas (talk | contribs) (Created stub. Saving and editing now.) |
Shawndouglas (talk | contribs) (Saving and adding more.) |
||
Line 32: | Line 32: | ||
'''Keywords''': Clustering, ChIP-seq, epigenetics, dynamic time warping | '''Keywords''': Clustering, ChIP-seq, epigenetics, dynamic time warping | ||
==Background== | |||
Sequencing-based technologies such as ChIP-Seq and DNAse-Seq [e.g., reviewed in Furey 2012<ref name="FureyChip12">{{cite journal |title=Chip-seq and beyond: New and improved methodologies to detect and characterize protein-DNA interactions |journal=Nature Reviews Genetics |author=Furey, T.S. |volume=13 |issue=12 |pages=840-52 |year=2012 |doi=10.1038/nrg3306 |pmid=23090257 |pmc=PMC3591838}}</ref>] have revolutionized our understanding of chromatin structure and function, yielding deep insights in the importance of epigenomic marks in the basic processes of life. The emergent picture is that gene expression is controlled by a complex interplay of protein binding and epigenomic modifications. While histone marks (and other epigenomic marks) can be measured in a high-throughput way, exploratory data analysis techniques for these data types are still being developed. Epigenomic marks exhibit characteristics that distinguish them fundamentally from e.g., mRNA gene expression measurements: they are spatially extended across regions as wide as several kilobases within which they often present interesting local structures, such as the presence of multiple peaks and troughs<ref name="BarskiHigh07">{{cite journal |title=High-resolution profiling of histone methylations in the human genome |journal=Cell |author=Barski, A.; Cuddapah, S.; Cui, K. et al. |volume=129 |issue=4 |pages=823-37 |year=2007 |doi=10.1016/j.cell.2007.05.009 |pmid=17512414}}</ref>, and intriguing asymmetries<ref name="KundajeUbiq12">{{cite journal |title=Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements |journal=Genome Research |author=Kundaje, A.; Kyriazopoulou-Panagiotopoulou, S.; Libbrecht, M. et al. |volume=22 |issue=9 |pages=1735-47 |year=2012 |doi=10.1101/gr.136366.111 |pmid=22955985 |pmc=PMC3431490}}</ref> (see Fig. 1). | |||
The shape of epigenomic marks across replicate data sets appears to be highly conserved, and has recently been exploited for statistical testing [4]. While the biological reasons for such conservation are not entirely clear, recent studies have suggested that both architectural and regulatory aspects may be at play. Bieberstein and colleagues showed intriguing patterns of accumulation of the histone marks H3K4me3 and H3K9ac at splice sites [5], hinting at an architectural origin of the shape of the marks. More recently, Benveniste et al showed that histone marks can be very well predicted genome-wide by the binding patterns of transcription factors (TFs) [6]. The shape of the peak may therefore be a readout of additional chromatin-related events and genomic regions which are similarly marked may therefore hint at common regulatory or architectural features. Excellent visualisation tools (e.g. UCSC genome browser) enable researchers to appreciate such features for individual enrichment peaks. However, while automatically grouping such marks based on shape similarity may be a valuable tool for hypothesis generation, it has remained a non-trivial task. | |||
==References== | ==References== |
Revision as of 19:10, 30 January 2017
Full article title | DGW: An exploratory data analysis tool for clustering and visualisation of epigenomic marks |
---|---|
Journal | BMC Bioinformatics |
Author(s) | Lukauskas, Saulius; Visintainer, Roberto; Sanguinetti, Guido; Schweikert, Gabriele B. |
Author affiliation(s) | Imperial College London, Fondazione Bruno Kessler, University of Edinburgh |
Primary contact | Email: saulius dot lukauskas13 at imperial dot ac dot uk |
Year published | 2016 |
Volume and issue | 17(Suppl 16) |
Page(s) | 447 |
DOI | 10.1186/s12859-016-1306-0 |
ISSN | 1471-2105 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1306-0 |
Download | http://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1306-0 (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
Background: Functional genomic and epigenomic research relies fundamentally on sequencing-based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high-dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent.
Results: We present DGW, an open-source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses dynamic time warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project.
Conclusions: Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open-source Python package.
Keywords: Clustering, ChIP-seq, epigenetics, dynamic time warping
Background
Sequencing-based technologies such as ChIP-Seq and DNAse-Seq [e.g., reviewed in Furey 2012[1]] have revolutionized our understanding of chromatin structure and function, yielding deep insights in the importance of epigenomic marks in the basic processes of life. The emergent picture is that gene expression is controlled by a complex interplay of protein binding and epigenomic modifications. While histone marks (and other epigenomic marks) can be measured in a high-throughput way, exploratory data analysis techniques for these data types are still being developed. Epigenomic marks exhibit characteristics that distinguish them fundamentally from e.g., mRNA gene expression measurements: they are spatially extended across regions as wide as several kilobases within which they often present interesting local structures, such as the presence of multiple peaks and troughs[2], and intriguing asymmetries[3] (see Fig. 1).
The shape of epigenomic marks across replicate data sets appears to be highly conserved, and has recently been exploited for statistical testing [4]. While the biological reasons for such conservation are not entirely clear, recent studies have suggested that both architectural and regulatory aspects may be at play. Bieberstein and colleagues showed intriguing patterns of accumulation of the histone marks H3K4me3 and H3K9ac at splice sites [5], hinting at an architectural origin of the shape of the marks. More recently, Benveniste et al showed that histone marks can be very well predicted genome-wide by the binding patterns of transcription factors (TFs) [6]. The shape of the peak may therefore be a readout of additional chromatin-related events and genomic regions which are similarly marked may therefore hint at common regulatory or architectural features. Excellent visualisation tools (e.g. UCSC genome browser) enable researchers to appreciate such features for individual enrichment peaks. However, while automatically grouping such marks based on shape similarity may be a valuable tool for hypothesis generation, it has remained a non-trivial task.
References
- ↑ Furey, T.S. (2012). "Chip-seq and beyond: New and improved methodologies to detect and characterize protein-DNA interactions". Nature Reviews Genetics 13 (12): 840-52. doi:10.1038/nrg3306. PMC PMC3591838. PMID 23090257. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3591838.
- ↑ Barski, A.; Cuddapah, S.; Cui, K. et al. (2007). "High-resolution profiling of histone methylations in the human genome". Cell 129 (4): 823-37. doi:10.1016/j.cell.2007.05.009. PMID 17512414.
- ↑ Kundaje, A.; Kyriazopoulou-Panagiotopoulou, S.; Libbrecht, M. et al. (2012). "Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements". Genome Research 22 (9): 1735-47. doi:10.1101/gr.136366.111. PMC PMC3431490. PMID 22955985. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431490.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.