Journal:From command-line bioinformatics to bioGUI

Full article title	From command-line bioinformatics to bioGUI
Journal	PeerJ
Author(s)	Joppich, Markus; Zimmer, Ralf
Author affiliation(s)	Ludwig-Maximilians-Universität München
Primary contact	Email: joppich at bio dot ifi dot lmu dot de
Editors	Gillespie, Joseph
Year published	2019
Volume and issue	7
Page(s)	e8111
DOI	10.7717/peerj.8111
ISSN	2167-8359
Distribution license	Creative Commons Attribution 4.0 International
Website	https://peerj.com/articles/8111/
Download	https://peerj.com/articles/8111.pdf (PDF)

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

Abstract

Bioinformatics is a highly interdisciplinary field providing informatics applications for scientists from many disciplines. Installing and starting applications on the command line (CL) is inconvenient and inefficient for many scientists. Nonetheless, most methods are implemented with a command-line interface only. Providing a graphical user interface (GUI) for bioinformatics applications is one step toward routinely making CL-only applications more readily available to scientists, yielding a positive step toward more effective interdisciplinary work. With our bioGUI framework, we address two main problems of using CL bioinformatics applications. First, many tools work on UNIX-based systems only, while many scientists use Microsoft Windows. Second, scientists refrain from using CL tools, which, despite their reservations, could well support them in their research. With bioGUI install modules and templates, installing and using CL tools is made possible for most scientists, even on Windows, due to bioGUI’s support for Windows Subsystem for Linux. In addition, bioGUI templates can easily be created, making the bioGUI framework highly rewarding for developers. From the bioGUI repository it is possible to download, install, and use bioinformatics tools with just a few clicks.

Introduction

Many advances in bioinformatics rely on sophisticated applications. Examples include Trinity^[1] for de novo assembly in conjunction with Trimmomatic^[2], or the HISAT2, StringTie, and Ballgown pipeline for transcript-level expression analysis.^[3] These tools have in common that when locally installed, only a command-line interface (CLI) is provided, implying a burden for many conducting sequence analysis and alignments.^[4] Jellyfish^[5], Glimmer^[6], and HMMER natively run only in UNIX-environments and require a sophisticated setup on Windows. In addition, the installation of command-line (CL) tools is a challenge for non-computer specialists, for example, due to package dependency resolution. This problem has been addressed by the AlgoRun package^[7], providing a Docker-based repository of tools. Being a web-based service, it limits use to web-applicable data sizes, or local data must be made available to the Docker container in the cloud. While AlgoRun has the advantage of processing data anywhere, it relies on Docker. Docker may be run either on a local workstation or in the cloud. On a local workstation it can induce incompatibilities with existing software (using Hyper-V on Windows). A cloud-based service may conflict with data privacy guidelines^[8], for example, with respect to a possible de-anonymization of patient samples.^[9] Using Windows Subsystem for Linux (WSL) is often possible in such a scenario: it is provided as an app from the Microsoft Store.

Supplemental information

DOI 10.7717/peerj.8111/supp-1 - Survey questions on command-line tools and bioGUI: This is the original survey used to assess problems with current bioinformatics applications. (PDF)

DOI 10.7717/peerj.8111/supp-2 - Answers on the survey on command-line tools and bioGUI: Each column represents a single participant. Questions are in rows. (XLXS)

Acknowledgements

We thank Luisa F. Jimenez-Soto and Gergely Csaba for their valuable input as well as for reviewing the manuscript. We thank the participants in our survey for their time. We thank the reviewers for their constructive feedback.

Authors’ contributions

Markus Joppich conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. Ralf Zimmer conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Funding

This work was supported by the Deutsche Forschungsgemeinschaft (Collaborative Research Centre SFB 1123-2/Z2). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data availability

The bioGUI documentation is available here. In order to set up Windows Subsystem for Linux (required for using bioGUI on Windows), follow the steps documented here. bioGUI is open-source software. Releases and code are available on the GitHub project page. Additional software (cwl2biogui) is available here.

Competing interests

The authors declare that they have no competing interests.

References

↑ Grabherr, M.G.; Haas, B.J.; Yassour, M. et al. (2011). "Full-length transcriptome assembly from RNA-Seq data without a reference genome". Nature Biotechnology 29 (7): 644–52. doi:10.1038/nbt.1883. PMC PMC3571712. PMID 21572440. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571712.
↑ Bolger, A.M.; Lohse, M.; Usadel, B. (2014). "Trimmomatic: A flexible trimmer for Illumina sequence data". Bioinformatics 30 (15): 2114-20. doi:10.1093/bioinformatics/btu170. PMC PMC4103590. PMID 24695404. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590.
↑ Pertea, M.; Kim, D.; Pertea, G.M. et al. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. 11. pp. 1650–67. doi:10.1038/nprot.2016.095. PMC PMC5032908. PMID 27560171. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5032908.
↑ Morais, D.; Roesch, L.F.W.; Redmile-Gordon, M. et al. (2018). BTW-Bioinformatics Through Windows: An easy-to-install package to analyze marker gene data. 6. pp. e5299. doi:10.7717/peerj.5299. PMC PMC6074753. PMID 30083449. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074753.
↑ Marçais, G.; Kingsford, C. (2011). "A fast, lock-free approach for efficient parallel counting of occurrences of k-mers". Bioinformatics 27 (6): 764–70. doi:10.1093/bioinformatics/btr011. PMC PMC3051319. PMID 21217122. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3051319.
↑ Delcher, A.L.; Bratke, K.A.; Powers, E.C. et al. (2007). "Identifying bacterial genes and endosymbiont DNA with Glimmer". Bioinformatics 23 (6): 673–9. doi:10.1093/bioinformatics/btm009. PMC PMC2387122. PMID 17237039. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2387122.
↑ Hosny, A.; Vera-Licona, P.; Laubenbacher, R. et al. (2016). "AlgoRun: A Docker-based packaging system for platform-agnostic implemented algorithms". Bioinformatics 32 (15): 2396–8. doi:10.1093/bioinformatics/btw120. PMC PMC6280798. PMID 27153722. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6280798.
↑ Schadt, E.E. (2012). "The changing privacy landscape in the era of big data". Molecular Systems Biology 8: 612. doi:10.1038/msb.2012.47. PMC PMC3472686. PMID 22968446. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472686.
↑ Gymrek, M.; McGuire, A.L.; Golan, D. et al. (2013). "Identifying personal genomes by surname inference". Science 339 (6117): 321–4. doi:10.1126/science.1229566. PMID 23329047.

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference. The original article lists references alphabetically, but this version—by design—lists them in order of appearance.