Journal:Intervene: A tool for intersection and visualization of multiple gene or genomic region sets

From LIMSWiki
Revision as of 23:24, 5 June 2017 by Shawndouglas (talk | contribs) (Created stub. Saving and adding more.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Full article title Intervene: A tool for intersection and visualization of multiple gene or genomic region sets
Journal BMC Bioinformatics
Author(s) Khan, Aziz; Mathelier, Anthony
Author affiliation(s) Centre for Molecular Medicine Norway, Norwegian Radium Hospital
Primary contact Email: aziz dot khan at ncmm dot uio dot no and anthony dot mathelier at ncmm dot uio dot no
Year published 2017
Volume and issue 18
Page(s) 287
DOI 10.1186/s12859-017-1708-7
ISSN 1471-2105
Distribution license Creative Commons Attribution 4.0 International
Website https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1708-7
Download https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-017-1708-7 (PDF)

Abstract

Background: A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited.

Results: To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets.

Conclusions: Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene, with the web application available at https://asntech.shinyapps.io/intervene.

Keywords: visualization, Venn diagrams, UpSet plots, heat maps, genome analysis

Background

Effective visualization of transcriptomic, genomic, and epigenomic data generated by next-generation sequencing-based high-throughput assays have become an area of great interest. Most of the data sets generated by such assays are lists of genes or variants, and genomic region sets. The genomic region sets represent genomic locations for specific features, such as transcription factor – DNA interactions, transcription start sites, histone modifications, and DNase hypersensitivity sites. A common task in the interpretation of these features is to find similarities, differences, and enrichments between such sets, which come from different samples, experimental conditions, or cell and tissue types.

Classically, the intersection or overlap between different sets, such as gene lists, is represented by Venn diagrams[1] or Edwards-Venn.[2] If the number of sets exceeds four, such diagrams become complex and difficult to interpret. The key challenge is that there are 2n combinations to visually represent when considering n sets. An alternative approach, the UpSet plots, was introduced to depict the intersection of more than three sets.[3] The advantage of UpSet plots is their capacity to rank the intersections and alternatively hide combinations without intersection, which is not possible using a Venn diagram. However, with a large number of sets, UpSet plots become an ineffective way of illustrating set intersections. To visualize a large number of sets, one can represent pairwise intersections using a clustered heat map as suggested by Lex and Gehlenborg.[4]

There are several web applications and [[R (programming language)|R packages] available to compute intersection and visualization of up to six list sets by using Venn diagrams. Although tools exist to perform genomic region set intersections[5][6][7], there is a limited number of tools available to visualize them.[5][6] To our knowledge no tool exists to generate UpSet plots for genomic region sets. Consequently, there is a great need for integrative tools to compute and visualize intersection of multiple sets of both genomic regions and gene/list sets.

To address this need, we developed Intervene, an easy-to-use command line tool to compute and visualize intersections of genomic regions with Venn diagrams, UpSet plots, or clustered heat maps. Moreover, we provide an interactive web application companion to upload list sets or the output of Intervene to further customize plots.

References

  1. Venn, J. (1880). "On the diagrammatic and mechanical representation of propositions and reasonings". Philisophical Magazine and Journal of Science 10 (59): 1–18. doi:10.1080/14786448008626877. 
  2. Edwards, A.W.F. (2004). Cogwheels of the Mind: The Story of Venn Diagrams. Johns Hopkins University Press. pp. 128. ISBN 9780801874345. 
  3. Lex, A.; Gehlenborg, N.; Strobelt, H. et al. (2014). "UpSet: Visualization of Intersecting Sets". IEEE Transactions on Visualization and Computer Graphics 20 (12): 1983-92. doi:10.1109/TVCG.2014.2346248. 
  4. Lex, A.; Gehlenborg, N. (2014). "Points of view: Sets and intersections". IEEE Transactions on Visualization and Computer Graphics 11: 779. doi:10.1038/nmeth.3033. 
  5. 5.0 5.1 Zhu, L.J.; Gazin, C.; Lawson, N.D. et al. (2010). "ChIPpeakAnno: A Bioconductor package to annotate ChIP-seq and ChIP-chip data". BMC Bioinformatics 11: 237. doi:10.1186/1471-2105-11-237. PMC PMC3098059. PMID 20459804. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098059. 
  6. 6.0 6.1 Dale, R.K.; Pedersen, B.S.; Quinlan, A.R. (2011). "Pybedtools: A flexible Python library for manipulating genomic datasets and annotations". Bioinformatics 27 (24): 3423–4. doi:10.1093/bioinformatics/btr539. PMC PMC3232365. PMID 21949271. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3232365. 
  7. Hunter, J.D. (2007). "Matplotlib: A 2D Graphics Environment". Computing in Science & Engineering 9 (3). doi:10.1109/MCSE.2007.55. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Some grammar were corrected when necessary.