Journal:University-level practical activities in bioinformatics benefit voluntary groups of pupils in the last 2 years of school

From LIMSWiki
Revision as of 21:22, 11 November 2015 by Shawndouglas (talk | contribs) (Added content. Saving and adding more.)
Jump to navigationJump to search
Full article title University-level practical activities in bioinformatics benefit voluntary groups of pupils in the last 2 years of school
Journal International Journal of STEM Education
Author(s) Barker, Daniel; Alderson, Rosanna G.; McDonagh, James L.; Plaisier, Heleen;
Comrie, Muriel M.; Duncan, Leigh; Muirhead, Gavin T.P.; Sweeney, Stuart D.
Author affiliation(s) University of St. Andrews, University of Manchester, Kilgraston School, Forfar Academy, Portlethen Academy
Primary contact Email: db60@st-andrews.ac.uk
Year published 2015
Volume and issue 2
Page(s) 17
DOI 10.1186/s40594-015-0030-z
ISSN 2196-7822
Distribution license Creative Commons Attribution 4.0 International
Website http://www.stemeducationjournal.com/content/2/1/17
Download http://www.stemeducationjournal.com/content/pdf/s40594-015-0030-z.pdf (PDF)

Abstract

Background: Bioinformatics — the use of computers in biology — is of major and increasing importance to biological sciences and medicine. We conducted a preliminary investigation of the value of bringing practical, university-level bioinformatics education to the school level. We conducted voluntary activities for pupils at two schools in Scotland (years S5 and S6; pupils aged 15–17). We used material originally developed for an optional final-year undergraduate module and now incorporated into 4273π, a resource for teaching and learning bioinformatics on the low-cost Raspberry Pi computer.

Results: Pupils’ feedback forms suggested our activities were beneficial. During the course of the activity, they provide strong evidence of increase in the following: pupils’ perception of the value of computers within biology; their knowledge of the Linux operating system and the Raspberry Pi; their willingness to use computers rather than phones or tablets; their ability to program a computer and their ability to analyse DNA sequences with a computer. We found no strong evidence of negative effects.

Conclusions: Our preliminary study supports the feasibility of bringing university-level, practical bioinformatics activities to school pupils.

Keywords: Bioinformatics; Computational biology; Secondary school; Raspberry Pi; Open access teaching material; Case study

Findings

Introduction

Progress in Science, Technology, Engineering, Mathematics and Medicine (STEMM) subjects is increasingly dominated by computational analyses. In biological sciences, for example, the exceptional pace of recent advances in technology for DNA and genome sequencing has created a demand for computationally able researchers, to analyse the large amounts of data produced. A field specialising in application of computation to biological problems has emerged, known as bioinformatics. The development of bioinformatics is discussed by Hogeweg[1], and university-level bioinformatics education has been reviewed by Magana et al.[2]

DNA sequences and related data are available at low cost (for new sequencing work) or free in online databases such as GenBank (Benson et al.[3]), Ensembl (Cunningham et al.[4]) and hundreds of others (Galperin et al.[5]). Software for bioinformatics research is usually free, for example the very widely used sequence database search software, BLAST (Altschul et al.[6]). Free resources are also available for bioinformatics teaching and learning, for example 4273π (Barker et al.[7]), Bioinformática na escola (Marques et al.[8]), GOBLET (Corpas et al.[9]), Bioinformatics@school and the EvoEd Digital Library. These publicly available data, software and materials present excellent opportunities for relatively low-cost teaching.

There has been a recent, encouraging increase in exposure of school pupils to bioinformatics (e.g. Gallagher et al.[10]; Lewitter and Bourne[11]; McQueen et al.[12]; Kovarik et al.[13]; Machluf and Yarden[14]; Wood and Gebhardt[15]; Marques et al.[8]; Toby and Pope[16]). Genomics and associated topics have started to appear in many official school curricula, for example in Scotland (see “Discussion”, below), the Netherlands (College voor Examens, p. 17[17]) and the USA (Wefer and Sheppard[18]). From a different angle, computer science is now a major part of the primary school curriculum for England. This is in line with a “back to basics” approach to computing currently emerging, as opposed to more traditional information and communications technology (ICT). In the UK, this change has been particularly associated with the low-cost Raspberry Pi computer, which is suitable for educational projects in electronics and engineering as well as general use and has sold over 5 million units. However, a practical link between computers and STEMM — which we will refer to as computational science, as opposed to computer science — still does not feature strongly on the UK school curriculum. DNA sequencing has a pervasive and increasing influence across traditionally disparate subject areas, including biochemistry, biomedical research, clinical medicine, evolutionary biology, ecology, neuroscience and anthropology. DNA sequencing is used to diagnose genetic and infectious diseases, discover drugs, characterise environments, monitor the progress of cancers, identify species and reveal evolutionary patterns. We consider increased amounts of practical bioinformatics at school to be a priority.

Motivated by the increasing importance of bioinformatics to the life sciences and its appearance on school curricula, we conducted a preliminary investigation of the benefits of bringing university-level bioinformatics teaching material to voluntary groups of children in the last 2 years of school in Scotland (S5 and S6; pupils aged 15–17). The material was originally developed for an optional, final-year undergraduate module at the University of St Andrews, BL4273 Bioinformatics for Biologists. To better match bioinformatics as it is actually used in research at universities, institutes and industry, the material uses the Linux operating system, in this case a variant of Rasbpian Linux running on low-cost Raspberry Pi hardware. This material has been released under an open access licence, as part of 4273π (Barker et al.[7]). Our proposition was that school pupils can benefit from practical, undergraduate-level bioinformatics teaching material. Compared to the undergraduates for whom this material was originally developed, school pupils are less experienced and knowledgeable about biology in general. However, their levels of practical bioinformatics experience are broadly similar: zero in the case of the school pupils, and approximately ten actual contact hours among undergraduates at the time of starting the module.

Many of the skills developed in our activities, and 4273π or bioinformatics in general, are generic skills in computational science. For example, although the programming language taught in the “INTRO” component — Perl — is particularly widely used in bioinformatics (e.g. Stajich et al.[19]; Stabenau et al.[20]), it is structurally similar to other programming languages widely used in science, including C, Fortran, Java, Python and R. Use of the command-line, emphasised in 4273π, is also essential in computational physics, computational chemistry and, indeed, computer science. Although computational chemistry is not yet part of the Higher qualification in Chemistry, several simulations are suggested by the Scottish Qualifications Authority.[21] Computational skills, as taught in 4273π, will be valuable to students taking chemistry, physics and other STEMM subjects at university.

Judged by pupil self-assessment forms, our preliminary trial was a success, though caution is required due to the small sample size. We will continue developing peer-reviewed bioinformatics material, targeted at school pupils and/or undergraduates, and applying it in practice. This will simultaneously lead to expansion of the 4273π resource and the gathering of larger, more complex and conclusive educational data at a future date. 4273π itself, and links to relevant social media groups, may be found at http://4273pi.org.

Methods

Two activities were carried out, each using a voluntary group of seven pupils studying science from a single school in Scotland. One group was from Kilgraston, an independent girls’ school, and the other was from Forfar Academy, a comprehensive school. In the case of Kilgraston, five pupils were at S5 and two were at S6 level, and instruction and assistance were provided by D.B., M.M.C., G.T.P.M. and H.P. In the case of Forfar Academy, all pupils were at S5 level, of whom two where girls and five were boys, and instruction and assistance were provided by R.G.A., D.B., L.D., J.L.M. and S.D.S. Generally, university staff or PhD students provided detailed instruction on the bioinformatics activity, and school staff highlighted links to material already taught and the curriculum. With a combination of university staff or PhD students and school staff, students were guided through the practical material of two components (modules) of 4273π Bioinformatics for Biologists. At Kilgraston, the event was held at the school, occupying an entire day on which no other classes were scheduled. With Forfar Academy, the event was held at the University of St Andrews, where students participated in an afternoon and evening session, primarily held in the same room used, at other times, by undergraduates on the BL4273 module. Refreshment breaks were included, using the school’s usual facilities (Kilgraston) or the Bell Pettigrew Museum (St Andrews). In total, the teaching time was approximately 4 h.

Raspberry Pi Model B hardware was used, one per student (plus one connected to a projector for demonstration). Prior to the first event, at Kilgraston, tasks were selected from existing material in discussion between D.B. and H.P. (familiar with 4273π) and M.M.C. (familiar with the school curriculum). For both groups of pupils, the first task corresponded to the “INTRO” component of 4273π Bioinformatics for Biologists, providing an introduction to the Raspberry Pi computer hardware, the Linux command-line, BLAST sequence similarity search software and Programming in the Perl language. The second task corresponded to the “DNA” component, involving an introduction to the FlyBase database (Dos Santos et al.[22]) and genome annotation with BLAST (Altschul et al.[6]), GeneWise (Birney et al.[23]) and SNAP (Korf[24]). Hard-copy handouts were provided. The handouts for Kilgraston and Forfar Academy were identical in content apart from date, location of the event, staff details and location of files (~/kilgraston or ~/forfar_academy). For the record, the specific handouts used are available as Additional file 1 (Kilgraston) and Additional file 2 (Forfar Academy), but with names and contact details redacted. The latest, open access versions of these will be found in 4273π.

Hard-copy, paired, “before” (prior to the use of the computers) and “after” questionnaires were used for pupils and school staff, involving questions on a 1–5 Likert scale for self-assessment of attitudes and free text (Table 1; Additional file 3). In preparing the questionnaires, for those questions on a Likert scale, the sequence of questions was randomised and the sense of each question (“1” corresponding to “good” on our subjective scale, vs “1” corresponding to “bad”) was randomised. The same sequence and sense were used for each questionnaire handed out; within each group (pupils or staff), the sequence of paired questions was the same “before” and “after”. Results of the questions on the Likert scale were summarised per question as a bar chart, and as a likelihood ratio sign test for evidence of systematic change over the course of the activity. We apply a likelihood approach to statistical inference (Birnbaum[25]; Edwards[26]; Royall[27]; Barker[28]). In common with other approaches to statistical inference, this provides no absolute threshold beyond which evidence is considered conclusive. By convention, we define “strong” evidence as a log (ln) likelihood ratio, Δℓ, of at least 2, or a likelihood ratio of at least 8 (Edwards, pp. 199–202[26]; Royall[27]). Were Δℓ converted to a p value under the assumptions of a likelihood ratio test (Wilks[29]), then for one free parameter Δℓ ≥ 2 corresponds to p ≤ 0.046, approximately the traditional threshold for statistical significance prior to any correction for multiple testing (i.e. p < 0.05). Calculations were performed in R (R Development Core Team[30]).

Table 1. Analysis of before and after questions for pupils on a Likert scale (1 “strongly disagree” to 5 “strongly agree”). For each pair of responses for each question, for each pupil, the change from before to after was noted, if any, and was reduced to a binary variable, indicating increase or decrease on the Likert scale. The proportion of changes that were increases and the proportion that were decreases constitute our maximum likelihood (ML) estimates for the probability of increase and probability of decrease, conditional on a change occurring. Whether the majority of changes are in the direction of improvement is indicated, with the direction indicating improvement being a subjective judgement by the authors in the context of this study. Assuming a binomial distribution, for each question separately, the likelihood of the observed changes was calculated, firstly, assuming the ML estimates for probability of increase and decrease obtained from the data; and secondly, assuming an extrinsic hypothesis that the probability of increase and probability of decrease are equal (0.5). From these likelihoods, the likelihood ratio and its natural logarithm (Δℓ) were calculated. Rows in italics show strong evidence of overall change in a specific direction over the course of the activity (Δℓ ≥ 2). N = 12 pupils submitted both “before” and “after” questionnaires
Question Increases Decreases Increases proportion Decreases proportion Most changes were improvements? Likelihood ratio Δℓ
1. I will end up working in science. 1 1 0.5 0.5 No 1 0
2. I think computers are useful within biology. 5 0 1 0 Yes 32 3.47
3. I have heard of Linux. 10 0 1 0 Yes 1024 6.93
4. I do not expect to enjoy [after: did not enjoy] the activity today 4 3 0.57 0.43 No 1.07 0.07
5. I know more about computers than most adults do. 4 2 0.67 0.33 Yes 1.40 0.34
6. I know more about computers than my teachers do. 2 0 1 0 Yes 4 1.39
7. I am not intending to go to university. 0 0 n/a n/a n/a 1 0
8. I have not heard of the Raspberry Pi. 1 10 0.09 0.91 Yes 71.78 4.27
9. I think computers are useful within sciences other than biology. 2 0 1 0 Yes 4 1.39
10. I would rather not use a computer, I would prefer to use a phone or a tablet. 0 3 0 1 Yes 8 2.08
11. I cannot program a computer. 1 8 0.11 0.89 Yes 22.17 3.10
12. I am not interested in biology. 1 0 1 0 No 2 0.69
13. I do not enjoy using computers for fun. 1 2 0.33 0.67 Yes 1.19 0.17
14. I am good at using a computer to analyse DNA sequences. 8 1 0.89 0.11 Yes 22.17 3.10
n/a not available

References

  1. Hogeweg, P. (2011). "The roots of bioinformatics in theoretical biology". PLOS Computational Biology 7 (3): e1002021. doi:10.1371/journal.pcbi.1002021. PMC PMC3068925. PMID 21483479. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3068925. 
  2. Magana, A.J.; Taleyarkhan, M.; Alvarado, D.R.; Kane, M.; Springer, J.; Clase, K. (2014). "A survey of scholarly literature describing the field of bioinformatics education and bioinformatics educational research". CBE – Life Sciences Education 13 (4): 607–23. doi:10.1187/cbe.13-10-0193. PMC PMC4255348. PMID 25452484. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255348. 
  3. Benson, D.A.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Sayers, E.W. (2015). "GenBank". Nucleic Acids Research 43 (Database issue): D30–D35. doi:10.1093/nar/gku1216. PMC PMC4383990. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383990. 
  4. Cunningham, F.; Amode, M.R.; Barrell, D., et al. (2015). "Ensembl 2015". Nucleic Acids Research 43 (Database issue): D662-D669. doi:10.1093/nar/gku1010. 
  5. Galperin, M.Y.; Rigden, D.J.; Fernández-Suárez, X.M. (2015). "The 2015 Nucleic Acids Research Database Issue and Molecular Biology Database Collection". Nucleic Acids Research 43 (Database issue): D1-D5. doi:10.1093/nar/gku1241. PMC PMC4383995. PMID 25593347. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383995. 
  6. 6.0 6.1 Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs". Nucleic Acids Research 25 (17): 3389-3402. doi:10.1093/nar/25.17.3389. PMC PMC146917. PMID 9254694. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC146917. 
  7. 7.0 7.1 Barker, D.; Ferrier, D.E.K.; Holland, P.W.H.; Mitchell, J.B.O.; Plaisier, H.; Ritchie, M.G.; Smart, S.D. (2013). "4273π: Bioinformatics education on low cost ARM hardware". BMC Bioinformatics 14: 243. doi:10.1186/1471-2105-14-243. PMC PMC3751261. PMID 23937194. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751261. 
  8. 8.0 8.1 Marques, I.; Almeida, P.; Alves, R.; João Dias, M.; Godinho, A.; Pereira-Leal, J.B. (2014). "Bioinformatics Projects Supporting Life-Sciences Learning in High Schools". PLOS Computational Biology 10 (1): e1003404. doi:10.1371/journal.pcbi.1003404. PMC PMC3900377. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3900377. 
  9. Corpas, M.; Jimenez, R.C.; Bongcam-Rudloff, E.; et al. (2015). "The GOBLET training portal: a global repository of bioinformatics training materials, courses and trainers". Bioinformatics 31 (1): 140–142. doi:10.1093/bioinformatics/btu601. PMC PMC4271145. PMID 25189782. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271145. 
  10. Gallagher, S.R.; Coon, W.; Donley, K.; Scott, A.; Goldberg, D.S. (2011). "A first attempt to bring computational biology into advanced high school biology classrooms". PLOS Computational Biology 7 (10): e1002244. doi:10.1371/journal.pcbi.1002244. PMC PMC3203055. PMID 22046118. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203055. 
  11. Lewitter, F.; Bourne, P.E. (2011). "Teaching bioinformatics at the secondary school level". PLOS Computational Biology 7 (10): e1002242. doi:10.1371/journal.pcbi.1002242. PMC PMC3203059. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3203059. 
  12. McQueen, J.; Wright, J.J.; Fox, J.A. (2012). "Design and implementation of a genomics field trip program aimed at secondary school students". PLOS Computational Biology 8 (8): e1002636. doi:10.1371/journal.pcbi.1002636. PMC PMC3431290. PMID 22956895. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431290. 
  13. Kovarik, D.N.; Patterson, D.G.; Cohen, C.; Sanders, E.A.; Peterson, K.A.; Porter, S.G.; Chowning, J.T. (2013). "Bioinformatics education in high school: implications for promoting science, technology, engineering, and mathematics careers". CBE – Life Sciences Education 12 (3): 441–59. doi:10.1187/cbe.12-11-0193. PMC PMC3763012. PMID 24006393. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3763012. 
  14. Machluf, Y.; Yarden, A. (2013). "Integrating bioinformatics into senior high school: design principles and implications". Briefings in Bioinformatics 14 (5): 648-60. doi:10.1093/bib/bbt030. PMID 23665511. 
  15. Wood, L.; Gebhardt, P. (2013). "Bioinformatics goes to school – new avenues for teaching contemporary biology". PLOS Computational Biology 9 (6): e1003089. doi:10.1371/journal.pcbi.1003089. PMC PMC3681668. PMID 23785266. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3681668. 
  16. Toby, I.; Pope, A. (2014). "Bioinformatics tools available for K-12 students to engage in research". Aviation, Space, and Environmental Medicine 85 (4): 484-485. doi:10.3357/ASEM.3953.2014. PMID 24754215. 
  17. College voor Examens (April 2014). "Biologie VWO. Syllabus Centraal Examen 2016" (PDF). pp. 59. http://www.examenblad.nl/examenstof/syllabus-2016-biologie-vwo-nader/2016/vwo/f=/biologie_vwo_2016_def_voor_hervaststelling.pdf. Retrieved 21 October 2015. 
  18. Wefer, S.H.; Sheppard, K. (2008). "Bioinformatics in high school biology curricula: a study of state science standards". CBE – Life Sciences Education 7 (1): 155–162. doi:10.1187/cbe.07-05-0026. PMC PMC2262119. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2262119. 
  19. Stajich, J.E.; Block, D.; Boulez, K.; et al. (2002). "The BioPerl toolkit: Perl modules for the life sciences". Genome Research 12 (10): 1611–1618. doi:10.1101/gr.361602. PMC PMC187536. PMID 12368254. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC187536. 
  20. Stabenau, A.; McVicker, G.; Melsopp, C.; Proctor, G.; Clamp, M.; Birney, E. (2004). "The Ensembl core software libraries". Genome Research 14 (5): 929–933. doi:10.1101/gr.1857204. PMC PMC479122. PMID 15123588. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC479122. 
  21. Scottish Qualifications Authority (2015). "Higher Chemistry Course Support Notes" (PDF). pp. 98. http://www.sqa.org.uk/files_ccc/CfE_CourseUnitSupportNotes_Higher_Sciences_Chemistry.pdf. Retrieved 20 October 2015. 
  22. Dos Santos, G.; Schroeder, A.J.; Goodman, J.L.; Strelets, V.B.; Crosby, M.A.; Thurmond, J.; Emmert, D.B.; Gelbart, W.M.; The FlyBase Consortium (2015). "FlyBase: introduction of the Drosophila melanogaster Release 6 reference genome assembly and large-scale migration of genome annotations". Nucleic Acids Research 43 (Database issue): D690-7. doi:10.1093/nar/gku1099. PMC PMC4383921. PMID 25398896. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383921. 
  23. Birney, E.; Clamp, M.; Durbin, R. (2004). "GeneWise and Genomewise". Genome Research 14 (5): 988–95. doi:10.1101/gr.1865504. PMC PMC479130. PMID 15123596. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC479130. 
  24. Korf, I. (2004). "Gene finding in novel genomes". BMC Bioinformatics 14 (5): 59. doi:10.1186/1471-2105-5-59. PMC PMC421630. PMID 15144565. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC421630. 
  25. Birnbaum, A. (1962). "On the Foundations of Statistical Inference". Journal of the American Statistical Association 57 (298): 269-306. doi:10.1080/01621459.1962.10480660. 
  26. 26.0 26.1 Edwards, A.W.F. (1992). Likelihood. Baltimore, MD: Johns Hopkins University Press. pp. 296. ISBN 9780801844430. 
  27. 27.0 27.1 Royall, R. (1997). Statistical evidence: a likelihood paradigm. Boca Raton, FL: Chapman and Hall/CRC. pp. 191. ISBN 9780412044113. 
  28. Barker, D. (2015). "Seeing the wood for the trees: philosophical aspects of classical, Bayesian and likelihood approaches in statistical inference and some implications for phylogenetic analysis". Biology & Philosophy 30 (4): 505–525. doi:10.1007/s10539-014-9455-x. 
  29. Wilks, S.S. (1938). "The large-sample distribution of the likelihood ratio for testing composite hypotheses". Annals of Mathematical Statistics 9 (1): 60-62. doi:10.1214/aoms/1177732360. 
  30. R Development Core Team (2015). "R: A language and environment for statistical computing". The R Foundation for Statistical Computing. https://www.r-project.org/. Retrieved 20 July 2015. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In several cases citation information was missing and was added to make the reference more useful.