Journal:Intervene: A tool for intersection and visualization of multiple gene or genomic region sets

From LIMSWiki
Revision as of 01:29, 6 June 2017 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Intervene: A tool for intersection and visualization of multiple gene or genomic region sets
Journal BMC Bioinformatics
Author(s) Khan, Aziz; Mathelier, Anthony
Author affiliation(s) Centre for Molecular Medicine Norway, Norwegian Radium Hospital
Primary contact Email: aziz dot khan at ncmm dot uio dot no and anthony dot mathelier at ncmm dot uio dot no
Year published 2017
Volume and issue 18
Page(s) 287
DOI 10.1186/s12859-017-1708-7
ISSN 1471-2105
Distribution license Creative Commons Attribution 4.0 International
Website https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1708-7
Download https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-017-1708-7 (PDF)

Abstract

Background: A common task for scientists relies on comparing lists of genes or genomic regions derived from high-throughput sequencing experiments. While several tools exist to intersect and visualize sets of genes, similar tools dedicated to the visualization of genomic region sets are currently limited.

Results: To address this gap, we have developed the Intervene tool, which provides an easy and automated interface for the effective intersection and visualization of genomic region or list sets, thus facilitating their analysis and interpretation. Intervene contains three modules: venn to generate Venn diagrams of up to six sets, upset to generate UpSet plots of multiple sets, and pairwise to compute and visualize intersections of multiple sets as clustered heat maps. Intervene, and its interactive web ShinyApp companion, generate publication-quality figures for the interpretation of genomic region and list sets.

Conclusions: Intervene and its web application companion provide an easy command line and an interactive web interface to compute intersections of multiple genomic and list sets. They have the capacity to plot intersections using easy-to-interpret visual approaches. Intervene is developed and designed to meet the needs of both computer scientists and biologists. The source code is freely available at https://bitbucket.org/CBGR/intervene, with the web application available at https://asntech.shinyapps.io/intervene.

Keywords: visualization, Venn diagrams, UpSet plots, heat maps, genome analysis

Background

Effective visualization of transcriptomic, genomic, and epigenomic data generated by next-generation sequencing-based high-throughput assays have become an area of great interest. Most of the data sets generated by such assays are lists of genes or variants, and genomic region sets. The genomic region sets represent genomic locations for specific features, such as transcription factor – DNA interactions, transcription start sites, histone modifications, and DNase hypersensitivity sites. A common task in the interpretation of these features is to find similarities, differences, and enrichments between such sets, which come from different samples, experimental conditions, or cell and tissue types.

Classically, the intersection or overlap between different sets, such as gene lists, is represented by Venn diagrams[1] or Edwards-Venn.[2] If the number of sets exceeds four, such diagrams become complex and difficult to interpret. The key challenge is that there are 2n combinations to visually represent when considering n sets. An alternative approach, the UpSet plots, was introduced to depict the intersection of more than three sets.[3] The advantage of UpSet plots is their capacity to rank the intersections and alternatively hide combinations without intersection, which is not possible using a Venn diagram. However, with a large number of sets, UpSet plots become an ineffective way of illustrating set intersections. To visualize a large number of sets, one can represent pairwise intersections using a clustered heat map as suggested by Lex and Gehlenborg.[4]

There are several web applications and [[R (programming language)|R packages] available to compute intersection and visualization of up to six list sets by using Venn diagrams. Although tools exist to perform genomic region set intersections[5][6][7], there is a limited number of tools available to visualize them.[5][6] To our knowledge no tool exists to generate UpSet plots for genomic region sets. Consequently, there is a great need for integrative tools to compute and visualize intersection of multiple sets of both genomic regions and gene/list sets.

To address this need, we developed Intervene, an easy-to-use command line tool to compute and visualize intersections of genomic regions with Venn diagrams, UpSet plots, or clustered heat maps. Moreover, we provide an interactive web application companion to upload list sets or the output of Intervene to further customize plots.

Implementation

Intervene comes as a command line tool, along with an interactive Shiny web application to customize the visual representation of intersections. The command line tool is implemented in Python (version 2.7) and R programming language (version 3.3.2). The build also works with Python versions 3.4, 3.5, and 3.6. The accompanying web interface is developed using Shiny (version 1.0.0), a web application framework for R. Intervene uses pybedtools[6] to perform genomic region set intersections and Seaborn (https://seaborn.pydata.org/), Matplotlib[7], UpSetR[8], and Corrplot[9] to generate figures. The web application uses the R package Venerable[10] for different types of Venn diagrams, UpSetR for UpSet plots, and heatmap.2 and Corrplot for pairwise intersection clustered heat maps. The UpSet module of the web ShinyApp was derived from the UpSetR[8] ShinyApp, which was extended by adding more options and features to customize the UpSet plots.

Intervene can be installed by using pip install intervene or using the source code available on bitbucket https://bitbucket.org/CBGR/intervene. The tool has been tested on Linux and MAC systems. The Shiny web application is hosted with shinyapps.io by RStudio, and is compatible with all modern web browsers. A detailed documentation including installation instructions and how to use the tool is provided in Additional file 1 and is available at http://intervene.readthedocs.io.

Results

An integrated tool for effective visualization of multiple set intersections

As visualization of sets and their intersections is becoming more and more challenging due to the increasing number of generated data sets, there is a strong need to have an integrated tool to compute and visualize intersections effectively. To address this challenge, we have developed Intervene, which is composed of three different modules, accessible through the subcommands venn, upset, and pairwise. Intervene accepts two types of input files: genomic regions in BED, GFF, or VCF format and gene/name lists in plain text format. A detailed sketch of Intervene’s command line interface and web application utility with types of inputs is provided in Fig. 1.


Fig1 Khan BMCBioinformatics2017 18.gif

Figure 1. A sketch of Intervene’s command line interface and web application, and input data type

Intervene provides flexibility to the user to choose figure colors, label text, size, resolution, and type to make them publication-standard quality. To read the help about any module, the user can type intervene < subcommand > −-help on the command line. Furthermore, Intervene produces results as text files, which can be easily imported to the web application for interactive visualization and customization of plots (see “An interactive web application” section).

Venn diagrams module

Venn diagrams are the classical approach to show intersections between sets. There are several web-based applications and R packages available to visualize intersections of up-to six list sets in classical Venn, Euler, or Edward’s diagrams.[11][12][13][14][15][16] However, a very limited number of tools are available to visualize genomic region intersections using classical Venn diagrams.[5][6]

References

  1. Venn, J. (1880). "On the diagrammatic and mechanical representation of propositions and reasonings". Philisophical Magazine and Journal of Science 10 (59): 1–18. doi:10.1080/14786448008626877. 
  2. Edwards, A.W.F. (2004). Cogwheels of the Mind: The Story of Venn Diagrams. Johns Hopkins University Press. pp. 128. ISBN 9780801874345. 
  3. Lex, A.; Gehlenborg, N.; Strobelt, H. et al. (2014). "UpSet: Visualization of Intersecting Sets". IEEE Transactions on Visualization and Computer Graphics 20 (12): 1983-92. doi:10.1109/TVCG.2014.2346248. 
  4. Lex, A.; Gehlenborg, N. (2014). "Points of view: Sets and intersections". IEEE Transactions on Visualization and Computer Graphics 11: 779. doi:10.1038/nmeth.3033. 
  5. 5.0 5.1 5.2 Zhu, L.J.; Gazin, C.; Lawson, N.D. et al. (2010). "ChIPpeakAnno: A Bioconductor package to annotate ChIP-seq and ChIP-chip data". BMC Bioinformatics 11: 237. doi:10.1186/1471-2105-11-237. PMC PMC3098059. PMID 20459804. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098059. 
  6. 6.0 6.1 6.2 6.3 Dale, R.K.; Pedersen, B.S.; Quinlan, A.R. (2011). "Pybedtools: A flexible Python library for manipulating genomic datasets and annotations". Bioinformatics 27 (24): 3423–4. doi:10.1093/bioinformatics/btr539. PMC PMC3232365. PMID 21949271. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3232365. 
  7. 7.0 7.1 Hunter, J.D. (2007). "Matplotlib: A 2D Graphics Environment". Computing in Science & Engineering 9 (3). doi:10.1109/MCSE.2007.55. 
  8. 8.0 8.1 Conway, J.R.; Lex, A.; Gehlenborg, N. (25 March 2017). "UpSetR: An R Package For The Visualization Of Intersecting Sets And Their Properties". bioRxiv. doi:10.1101/120600. 
  9. Wei, T.; Simko, V. (21 April 2016). "corrplot: Visualization of a Correlation Matrix". https://cran.r-project.org/package=corrplot. 
  10. Swinton, J. (23 September 2009). "Venn diagrams in R with Vennerable package". https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/Vennerable/inst/doc/Venn.pdf?revision=58&root=vennerable. 
  11. Hulsen, T.; de Vlieg, J.; Alkema, W. (2008). "BioVenn - A web application for the comparison and visualization of biological lists using area-proportional Venn diagrams". BMC Genomics 9: 488. doi:10.1186/1471-2164-9-488. PMC PMC2584113. PMID 18925949. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2584113. 
  12. Lam, F.; Lalansingh, C.M.; Babaran, H.E. (2016). "VennDiagramWeb: A web application for the generation of highly customizable Venn and Euler diagrams". BMC Bioinformatics 17 (1): 401. doi:10.1186/s12859-016-1281-5. PMC PMC5048655. PMID 27716034. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048655. 
  13. Bardou, P.; Mariette, J.; Escudié, F. et al. (2014). "jvenn: An interactive Venn diagram viewer". BMC Bioinformatics 15: 293. doi:10.1186/1471-2105-15-293. PMC PMC4261873. PMID 25176396. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4261873. 
  14. Lin, G.; Chai, J.; Yuan, S. et al. (2016). "VennPainter: A Tool for the Comparison and Identification of Candidate Genes Based on Venn Diagrams". PLoS One 11 (4): e0154315. doi:10.1371/journal.pone.0154315. PMC PMC4847855. PMID 27120465. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4847855. 
  15. Martin, B.; Chadwick, W.; Yi, T. et al. (2012). "VENNTURE--A novel Venn diagram investigational tool for multiple pharmacological dataset analysis". PLoS One 7 (5): e36911. doi:10.1371/journal.pone.0036911. PMC PMC3351456. PMID 22606307. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351456. 
  16. Heberle, H.; Meirelles, G.V.; da Silva, F.R. et al. (2015). "InteractiVenn: A web-based tool for the analysis of sets through Venn diagrams". BMC Bioinformatics 16: 169. doi:10.1186/s12859-015-0611-3. PMC PMC4455604. PMID 25994840. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4455604. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Some grammar were corrected when necessary.