Journal:From command-line bioinformatics to bioGUI

From LIMSWiki
Revision as of 19:10, 6 January 2020 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title From command-line bioinformatics to bioGUI
Journal PeerJ
Author(s) Joppich, Markus; Zimmer, Ralf
Author affiliation(s) Ludwig-Maximilians-Universität München
Primary contact Email: joppich at bio dot ifi dot lmu dot de
Editors Gillespie, Joseph
Year published 2019
Volume and issue 7
Page(s) e8111
DOI 10.7717/peerj.8111
ISSN 2167-8359
Distribution license Creative Commons Attribution 4.0 International
Website https://peerj.com/articles/8111/
Download https://peerj.com/articles/8111.pdf (PDF)

Abstract

Bioinformatics is a highly interdisciplinary field providing informatics applications for scientists from many disciplines. Installing and starting applications on the command line (CL) is inconvenient and inefficient for many scientists. Nonetheless, most methods are implemented with a command-line interface only. Providing a graphical user interface (GUI) for bioinformatics applications is one step toward routinely making CL-only applications more readily available to scientists, yielding a positive step toward more effective interdisciplinary work. With our bioGUI framework, we address two main problems of using CL bioinformatics applications. First, many tools work on UNIX-based systems only, while many scientists use Microsoft Windows. Second, scientists refrain from using CL tools, which, despite their reservations, could well support them in their research. With bioGUI install modules and templates, installing and using CL tools is made possible for most scientists, even on Windows, due to bioGUI’s support for Windows Subsystem for Linux. In addition, bioGUI templates can easily be created, making the bioGUI framework highly rewarding for developers. From the bioGUI repository it is possible to download, install, and use bioinformatics tools with just a few clicks.

Introduction

Many advances in bioinformatics rely on sophisticated applications. Examples include Trinity[1] for de novo assembly in conjunction with Trimmomatic[2], or the HISAT2, StringTie, and Ballgown pipeline for transcript-level expression analysis.[3] These tools have in common that when locally installed, only a command-line interface (CLI) is provided, implying a burden for many conducting sequence analysis and alignments who are not computing-adept.[4] Jellyfish[5], Glimmer[6], and HMMER natively run only in UNIX-environments and require a sophisticated setup on Windows. In addition, the installation of command-line (CL) tools is a challenge for non-computer specialists, for example, due to package dependency resolution. This problem has been addressed by the AlgoRun package[7], providing a Docker-based repository of tools. Being a web-based service, it limits use to web-applicable data sizes, or local data must be made available to the Docker container in the cloud. While AlgoRun has the advantage of processing data anywhere, it relies on Docker. Docker may be run either on a local workstation or in the cloud. On a local workstation it can induce incompatibilities with existing software (using Hyper-V on Windows). A cloud-based service may conflict with data privacy guidelines[8], for example, with respect to a possible de-anonymization of patient samples.[9] Using Windows Subsystem for Linux (WSL) is often possible in such a scenario: it is provided as an app from the Microsoft Store.

A frequent argument for not providing a graphical user interface (GUI) is the overhead for developing it and the effort to make it truly “user centered.” Often GUIs are simply deemed unnecessary by application developers. However, one can be skeptical whether scientists who are not computing-adept can efficiently use CLIs in their research. In fact, bioinformatician Dr. István Albert[10] notes that “Bioinformatics, unfortunately, has quite the number of methods that represent the disconnect of the Ivory Tower.” Pavelin et al.[11] note that software is often developed without a focus on usability of interfaces (for end-users). While this does not imply that any GUI is helpful, we argue that without a GUI, the otherwise highly sophisticated CL applications are not very useful for some scientists. Besides, a GUI is often more convenient and helps to avoid using the wrong parameters, especially if an application is not yet routinely used in a lab. University of Western Ontario's David Roy Smith[12] also states that GUI-driven applications make daily work in biology or medical labs easier. Smith remarks that many end-users have a “penchant for point and click,” not being able to effectively use CL tools. Still, they should have the ability to access and analyse their own data. Many proprietary software solutions address this demand: they allow GUI-based data management, while also being extensible via plug-ins. Smith[13] also points out that one of the biggest advantages of such plugins is to combine the power of peer-reviewed algorithms with a user-friendly GUI. Thus, providing a GUI is an important step toward the applicability of methods by end-users.

Visne et al.[14] present a universal GUI for R aiming to close the gap between R developers and GUI-dependent users with limited R scripting skills. Additionally, web-based workflow systems, like Galaxy[15] or Yabi[16] provide a means to easily execute bioinformatics applications, but they tend to focus on more complex workflows. However, both Galaxy and Yabi are designed to be run and maintained by bioinformaticians for several users and are not meant to run on a single, individual basis, like in small labs. More recently Morais et al.[4] stated that the accessibility of bioinformatics applications is one of the main challenges of contemporary biology, and that one of the main problems for users is the struggle of using CLIs. While a GUI does not make an application user-friendly per se, it helps to make it more accessible by lowering the burden to use it.[14][4][17][18][19]


Supplemental information

  • DOI 10.7717/peerj.8111/supp-1 - Survey questions on command-line tools and bioGUI: This is the original survey used to assess problems with current bioinformatics applications. (PDF)
  • DOI 10.7717/peerj.8111/supp-2 - Answers on the survey on command-line tools and bioGUI: Each column represents a single participant. Questions are in rows. (XLXS)

Acknowledgements

We thank Luisa F. Jimenez-Soto and Gergely Csaba for their valuable input as well as for reviewing the manuscript. We thank the participants in our survey for their time. We thank the reviewers for their constructive feedback.

Authors’ contributions

Markus Joppich conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. Ralf Zimmer conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Funding

This work was supported by the Deutsche Forschungsgemeinschaft (Collaborative Research Centre SFB 1123-2/Z2). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data availability

The bioGUI documentation is available here. In order to set up Windows Subsystem for Linux (required for using bioGUI on Windows), follow the steps documented here. bioGUI is open-source software. Releases and code are available on the GitHub project page. Additional software (cwl2biogui) is available here.

Competing interests

The authors declare that they have no competing interests.

References

  1. Grabherr, M.G.; Haas, B.J.; Yassour, M. et al. (2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome". Nature Biotechnology 29 (7): 644–52. doi:10.1038/nbt.1883. PMC PMC3571712. PMID 21572440. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571712. 
  2. Bolger, A.M.; Lohse, M.; Usadel, B. (2014). "Trimmomatic: A flexible trimmer for Illumina sequence data". Bioinformatics 30 (15): 2114-20. doi:10.1093/bioinformatics/btu170. PMC PMC4103590. PMID 24695404. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590. 
  3. Pertea, M.; Kim, D.; Pertea, G.M. et al. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. 11. pp. 1650–67. doi:10.1038/nprot.2016.095. PMC PMC5032908. PMID 27560171. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5032908. 
  4. 4.0 4.1 4.2 Morais, D.; Roesch, L.F.W.; Redmile-Gordon, M. et al. (2018). BTW-Bioinformatics Through Windows: An easy-to-install package to analyze marker gene data. 6. pp. e5299. doi:10.7717/peerj.5299. PMC PMC6074753. PMID 30083449. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074753. 
  5. Marçais, G.; Kingsford, C. (2011). "A fast, lock-free approach for efficient parallel counting of occurrences of k-mers". Bioinformatics 27 (6): 764–70. doi:10.1093/bioinformatics/btr011. PMC PMC3051319. PMID 21217122. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3051319. 
  6. Delcher, A.L.; Bratke, K.A.; Powers, E.C. et al. (2007). "Identifying bacterial genes and endosymbiont DNA with Glimmer". Bioinformatics 23 (6): 673–9. doi:10.1093/bioinformatics/btm009. PMC PMC2387122. PMID 17237039. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2387122. 
  7. Hosny, A.; Vera-Licona, P.; Laubenbacher, R. et al. (2016). "AlgoRun: A Docker-based packaging system for platform-agnostic implemented algorithms". Bioinformatics 32 (15): 2396–8. doi:10.1093/bioinformatics/btw120. PMC PMC6280798. PMID 27153722. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6280798. 
  8. Schadt, E.E. (2012). "The changing privacy landscape in the era of big data". Molecular Systems Biology 8: 612. doi:10.1038/msb.2012.47. PMC PMC3472686. PMID 22968446. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472686. 
  9. Gymrek, M.; McGuire, A.L.; Golan, D. et al. (2013). "Identifying personal genomes by surname inference". Science 339 (6117): 321–4. doi:10.1126/science.1229566. PMID 23329047. 
  10. Albert, I. (2016). "The Biostar Handbook". https://biostar.myshopify.com/. 
  11. Pavelin, K.; Cham, J.A.; de Matos, P. et al. (2012). "Bioinformatics meets user-centred design: a perspective". PLoS Computational Biology 8 (7): e1002554. doi:10.1371/journal.pcbi.1002554. PMC PMC3395592. PMID 22807660. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3395592. 
  12. Smith, D.R. (2013). "The battle for user-friendly bioinformatics". Frontiers in Genetics 4: 187. doi:10.3389/fgene.2013.00187. PMC PMC3778374. PMID 24065986. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3778374. 
  13. Smith, D.R. (2015). Buying in to bioinformatics: An introduction to commercial sequence analysis software. 16. p. 700-9. doi:10.1093/bib/bbu030. PMC PMC4501248. PMID 25183247. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4501248. 
  14. 14.0 14.1 Visne, I.; Dilaveroglu, E.; Vierlinger, K. et al. (2009). RGG: A general GUI Framework for R scripts. 10. p. 74. doi:10.1186/1471-2105-10-74. PMC PMC2653488. PMID 19254356. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2653488. 
  15. Afgan, E.; Baker, D.; van den Beek, M. et al. (2016). "The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update". Nucleic Acids Research 44 (W1): W3–W10. doi:10.1093/nar/gkw343. PMC PMC4987906. PMID 27137889. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987906. 
  16. Hunter, A.A.; Macgregor, A.B.; Szabo, T.O. et al. (2012). "Yabi: An online research environment for grid, high performance and cloud computing". Source Code for Biology and Medicine 7 (1): 1. doi:10.1186/1751-0473-7-1. PMC PMC3298538. PMID 22333270. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298538. 
  17. Xu, G.; Strong, M.J.; Lacey, M.R. et al. (2014). "RNA CoMPASS: A dual approach for pathogen and host transcriptome analysis of RNA-seq datasets". PLoS One 9 (2): e89445. doi:10.1371/journal.pone.0089445. PMC PMC3934900. PMID 24586784. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3934900. 
  18. Anslan, S.; Bahram, M.; Hiirsalu, I. et al. (2017). "PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data". Molecular Ecology Resources 17 (6): e234-e240. doi:10.1111/1755-0998.12692. PMID 28544559. 
  19. Vetrovský, T.; Baldrian, P.; Morais, D. (2018). "SEED 2: A user-friendly platform for amplicon high-throughput sequencing data analyses". Bioinformatics 34 (13): 2292-2294. doi:10.1093/bioinformatics/bty071. PMC PMC6022770. PMID 29452334. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022770. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference. The original article lists references alphabetically, but this version—by design—lists them in order of appearance.