Journal:4273π: Bioinformatics education on low cost ARM hardware

From LIMSWiki
Revision as of 19:22, 13 November 2015 by Shawndouglas (talk | contribs) (Added content. Saving and adding more.)
Jump to navigationJump to search
Full article title 4273π: Bioinformatics education on low cost ARM hardware
Journal BMC Bioinformatics
Author(s) Barker, Daniel; Ferrier, David E.K.; Holland, Peter W.H.; Mitchell, John B.O.; Plaisier, Heleen; Ritchie, Michael G.; Smart, Steven D.
Author affiliation(s) University of St. Andrews, University of Oxford
Primary contact Email: db60@st-andrews.ac.uk
Year published 2013
Volume and issue 14
Page(s) 243
DOI 10.1186/1471-2105-14-243
ISSN 1471-2105
Distribution license Creative Commons Attribution 2.0 Generic
Website http://www.biomedcentral.com/1471-2105/14/243
Download http://www.biomedcentral.com/content/pdf/1471-2105-14-243.pdf (PDF)

Abstract

Background: Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access.

Results: We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013.

Conclusions: 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.

Keywords: Bioinformatics education; Teaching material; Raspberry Pi; Linux

Background

Bioinformatics is increasingly included in the undergraduate curriculum for biology students. Teaching bioinformatics is made difficult, however, by the constraints of a typical university computer classroom. Some areas of basic bioinformatics may be taught using such classrooms, where all that is required is an Internet connection and Web browser (e.g. BLAST[1] searches at the NCBI[2]). More in-depth teaching requires the re-creation of a bioinformatics research environment, consisting of a Linux or UNIX operating system, standard GNU utilities[3], specialist bioinformatics software, and sequence databases.

Undergraduate modules ought, ideally, to prepare students for research in an academic research group. Students who do pursue a research career will often find that institutional computer support is targeted to generic computer use (e.g. Microsoft software) rather than installing and maintaining systems suitable for bioinformatics. Particularly outside of bioinformatics research (but also occasionally within it), the principal investigator of the research group may never have used Linux, may have a limited idea of the procedures, and may expect group members to ‘pick things up’ and deal with problems themselves. This requires researchers to have a high level of proficiency with Linux, including the ability to install both standard Linux packages and software for which no standard package may be available. A single taught module cannot prepare a student for all eventualities, but ought to leave the student with the basic skills and confidence to be able to discover solutions, and implement them, as required. Hence, a certain amount of system administration should appear in an undergraduate bioinformatics module for biologists.

Traditionally, the environment required for an undergraduate bioinformatics module has been created in one of four ways. Firstly, one may set up a central GNU/Linux server on the campus and allow students to connect by Secure Shell, ssh (‘the server approach’). The server will typically run either a standard Linux distribution or a specialist bioinformatics distribution such as NEBC Bio-Linux.[4] The server approach allows the instructor to have full control over the server, and allows students to connect from existing computing classrooms with little or no adjustment to the classroom software. For students to connect to the server via the intranet, classroom computers only require an ssh client, the X Window System (X11), and a means of file transfer such as secure copy (scp). Students may also connect to the server from home (typically requiring them to install virtual private network software in addition to ssh, X11 and an scp client) or elsewhere on campus. Secondly, one may provide students with a virtual machine, consisting of an environment similar to that which they might experience on the Linux server but running on a classroom computer, either with a standard Linux distribution or a specialist bioinformatics distribution such as DNA Linux Virtual Desktop Edition[5] (‘the VM approach’). This has the advantage that students may be given administrator access to their virtual machine. Thirdly, one may provide students with a Linux system on removable media (‘the USB stick approach’, for example[6]; where files and settings do not have to be saved, a DVD may be used instead[7]). So long as students have the media to hand, this allows them to boot into ‘their own’ Linux. As with the VM approach, students may be given administrator access. The additional advantage is that the media may be portable between computer classrooms and home computers, without requiring students to move virtual machine image files. Fourthly, students may be loaned or required to buy laptops of a specific kind, with a suitable operating system, data and software installed (‘the laptop approach’). This avoids hardware incompatibilities that the USB stick approach may, in practice, experience.[6]

Because administrator access cannot be allowed, the server approach fails to give students experience of the standard mechanism of software installation. It also involves competition for resources such as CPU time, especially if the class is large or the server is also shared with research colleagues. The VM approach solves both these problems but is less portable. Although, in theory, students may transfer a VM from one computer to another (assuming the destination has the necessary virtualisation software installed), the task is non-trivial, and more time consuming than a simple transfer of data or documents. The USB stick approach reduces the portability problem, since it is trivial to move a USB stick from one computer to another. However, smooth operation on all hardware is not guaranteed and requires ongoing efforts from the developers of the Linux distribution as new hardware is released. The laptop approach avoids all these problems by providing a portable computer holding everything required for the course. However, it is expensive.

As a fifth approach, we propose loaning a Raspberry Pi computer[8] and associated peripherals to students for the duration of the course (‘the Raspberry Pi approach’). This includes a customised version of Linux, appropriate software and data. This allows students full administrator access to a suitable operating system, without the difficulties of the VM or USB stick approaches. Should the student accidentally damage critical files, the system can be re-written from a master image.

The Raspberry Pi Model B — with 256 MB (now 512 MB) RAM, an ARM11 CPU running at 700 MHz before over clocking and a Video core IV GPU — was released for public sale in 2012[9] and costs £28.07a or £31.20.[10][10] Though additional items are required to turn it into a functioning, general-purpose computer (case, charger, SD card, mouse, keyboard, monitor and cable; and an entirely separate computer for initialising the SD card), it is still relatively low-cost (Additional file 1: Table S1). The existence of the Raspberry Pi is partly a celebration of the early days of popular computing in the 1980s, and an attempt to recreate that excitement among young people today.[11] It is also a symptom of the rapidly decreasing costs and increasing performance of computer hardware. The Raspberry Pi uses an ARM CPU.[12] Because of their high performance-per-watt, ARM CPUs are frequently found in small electronic appliances such as mobile phones and tablets. With CPU innovation increasingly driven by such applications, as opposed to more traditional areas such as desktop, laptop and server computers, the prevalence and utility of ARM-based computer hardware is likely to increase. Indeed, ARM-based servers are starting to appear in data centres, due to their modest requirements for power.[13]

Additional file 1. Table S1
Example prices of Raspberry Pi peripherals we found to work well in practice. These are presented without any endorsement. A case for the Raspberry Pi (various models and suppliers; ~£5-£10), the Raspberry Pi itself (see main text) and a monitor are not shown. Standard consumer prices, including UK tax but excluding any delivery charge, were obtained from the Insight UK Web site or via the Amazon UK Web site on 7 April 2013.

Format: PDF; Size: 110KB Download file

Though far slower than current desktop and laptop computers, the Raspberry Pi is notably faster than the Cray 1 supercomputer[14], a marvel of computer speed in its day. The valid question arises as to how much computer power is actually required to teach undergraduates bioinformatics? We propose that the answer is, by current standards, ‘not much’. The Raspberry Pi is more than adequate for the task. The Raspberry Pi approach includes all the benefits of the laptop approach, above, but at lower cost. In addition, the Raspberry Pi is a new and exciting computer system, which in itself can add interest to the course.

A variety of operating systems is available for the Raspberry Pi.[15] These include Raspbian[16], which is based on Debian GNU/Linux.[17] Over 35,000 Debian software packages are available pre-compiled for Raspbian, including Web browsers, text editors, word processors, and a wide range of bioinformatics packages.[18] Other software will usually compile and run without problems. Some features of recent CPUs (e.g. 64-bit addressing or vector operations) are absent, but we have not found these to be at all necessary for our proposed use of the Raspberry Pi. The most serious limitation, affecting structure visualisation software in particular, is limited graphics performance. Even so, some structural visualisation software does work on the Raspberry Pi.[19] New system software, improving graphics performance by making better use of the Raspberry Pi’s GPU, is under development.[20]

We provide 4273π, a customised version of Linux for Raspberry Pi computer hardware. 4273π includes an Open Access bioinformatics course, 4273π Bioinformatics for Biologists.

References

  1. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research 25 (17): 3389-3402. doi:10.1093/nar/25.17.3389. PMC PMC146917. PMID 9254694. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC146917. 
  2. "BLAST: Basic Local Alignment Search Tool". National Center for Biotechnology Information. http://blast.ncbi.nlm.nih.gov/Blast.cgi. 
  3. "The GNU Operating Systems and the Free Software Movement". Free Software Movement, Inc. http://www.gnu.org/. 
  4. Field, D.; Tiwari, B. Booth, T.; Houten, S.; Swan, D.; Bertrand, N.; Thurston, M. (2006). "Open software for biologists: from famine to feast". Nature Biotechnology 24 (7): 801–803. doi:10.1038/nbt0706-801. PMID 16841067. 
  5. Bassi, S.; Gonzalez, V.C. (2007). "DNALinux Virtual Desktop Edition". Nature Precedings. doi:10.1038/npre.2007.670.1. 
  6. 6.0 6.1 "Bio-Linux 7". Environmental Omics Network. http://nebc.nerc.ac.uk/tools/bio-linux/live-usbkey. 
  7. Yu, G.; Wang, L.G.; Meng, X.H.; He, Q.Y. (2012). "LXtoo: an integrated live Linux distribution for the bioinformatics community". BMC Research Notes 5: 360. doi:10.1186/1756-0500-5-360. PMC PMC3461469. PMID 22813356. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3461469. 
  8. "Raspberry Pi". Raspberry Pi Foundation. https://www.raspberrypi.org/. 
  9. "The Raspberry Pi computer goes on general sale". BBC. 29 February 2012. http://www.bbc.com/news/technology-17190918. 
  10. 10.0 10.1 "Raspberry Pi". Element14 Community. Premier Farnell plc. http://www.element14.com/community/community/raspberry-pi.  Cite error: Invalid <ref> tag; name "E14Pi" defined multiple times with different content
  11. Roberts, Jonathan (1 May 2012). "Is the Raspberry Pi the future of computing?". TechRadar Pro. Future Publishing Limited. http://www.techradar.com/news/computing/pc/is-the-raspberry-pi-the-future-of-computing-1078276. 
  12. "ARM: The Architecture for the Digital World". ARM Ltd. http://www.arm.com/. 
  13. Latif, Lawrence (5 April 2013). "ARM sees its 32-bit chips being deployed in future servers: Not everything needs 64-bit addressing". The Inquirer. Incisive Business Media Limited. http://www.theinquirer.net/inquirer/news/2259386/arm-sees-its-32bit-chips-being-deployed-in-future-servers. 
  14. Nic (19 November 2012). "A Cray for $35". 2000 Nickels. http://2000nickels.com/blog/2012/11/19/a-cray-for-35-dollars/. 
  15. "Raspberry Pi - Operating system distributions". Raspberry Pi Foundation. https://www.raspberrypi.org/forums/viewforum.php?f=18. 
  16. "Raspbian". Mike Thompson and the Raspbian community. http://www.raspbian.org/. 
  17. "Debian". Software in the Public Interest, Inc. http://www.debian.org/. 
  18. Möller, S.; Krabbenhöft, H.N.; Tille, A.; Paleino, D.; Williams, A.; Wolstencroft, K.; Goble, C.; Holland, R.; Belhachemi, D.; Plessy, C. (2010). "Community-driven computational biology with Debian Linux". BMC Bioinformatics 11 (Suppl 12): S5. doi:10.1186/1471-2105-11-S12-S5. PMC PMC3040531. PMID 21210984. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040531. 
  19. O'Boyle, Noel (19 January 2013). "Chemistrify your Raspberry Pi Part III". Noel O'Blog. http://baoilleach.blogspot.co.uk/2013/01/chemistrify-your-raspberry-pi-part-iii_19.html. 
  20. Upton, Eben (24 May 2013). "Wayland project". Raspberry Pi Blog. http://www.raspberrypi.org/archives/4053. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In most of the article's references DOIs and PubMed IDs were not given; they've been added to make the references more useful. In some cases important information was missing from the references, and that information was added.