Journal:Using interactive digital notebooks for bioscience and informatics education

From LIMSWiki
Revision as of 20:52, 22 May 2021 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Using interactive digital notebooks for bioscience and informatics education
Journal PLOS Computational Biology
Author(s) Davies, Alan; Hooley, Frances; Causey-Freeman, Peter; Eleftheriou, Iliada; Moulton, Georgina
Author affiliation(s) University of Manchester
Primary contact Email: alan dot davies-2 at manchester dot ac dot uk
Editors Ouellette, Francis
Year published 2020
Volume and issue 16(11)
Article # e1008326
DOI 10.1371/journal.pcbi.1008326
ISSN 1553-734X
Distribution license Creative Commons Attribution 4.0 International
Website https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008326
Download https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008326 (PDF)

Abstract

Interactive digital notebooks provide an opportunity for researchers and educators to carry out data analysis and report results in a single digital format. Further to just being digital, the format allows for rich content to be created in order to interact with the code and data contained in such a notebook to form an educational narrative. This primer introduces some of the fundamental aspects involved in using Jupyter Notebook in an educational setting for teaching in the bioinformatics and health informatics disciplines. We also provide two case studies that detail 1. how we used Jupyter Notebooks to teach non-coders programming skills on a blended master’s degree module for a health informatics program, and 2. a fully online distance learning unit on programming for a postgraduate certificate (PG Cert) in clinical bioinformatics, with a more technical audience.

Keywords: bioinformatics, health informatics, programming, data analysis, Jupyter Notebook, education

Introduction

Universities and other higher education institutions are now under increasing pressure to provide more online and distance learning courses and to deliver them cost effectively and rapidly.[1] This increase in demand is partly based on students wanting more flexible study options in comparison to traditional higher education course delivery to aid in study around employment and family commitments. This is also driven by financial considerations that allow higher education institutions to scale course delivery while managing infrastructural provision (e.g., access to rooms for teaching and limited capacity for face-to-face delivery).[2] To meet this challenge, we require tools that cater for students with varying levels of digital literacy and reduce the burden of them having to download and install software, all of which requires support, which is more difficult to provide at a distance. This can be further complicated when students use managed equipment (e.g., National Health Service [NHS] employees) and may not have administrator rights to install software.

Digital notebooks provided us with a way of meeting these needs, as they are easy to set up, straightforward to use, and can support rich and interactive content. Here, we present a primer on how to use digital notebooks (specifically Jupyter Notebooks) for teaching and assessment, along with details of two case studies where we used notebooks to teach Python programming and database skills for clinical bioinformatics and health informatics students of varying levels of technical experience. The case studies and methods presented can be applied to both distance learning and face-to-face teaching scenarios.

We will start by covering what a Jupyter Notebook is along with the different “cell” types available. We then look at how they can be run and enhanced with extensions to add items like exercise tasks and other interactivity before looking at how they can be used in assessment. Next, we present two case studies where we have applied notebooks to teach different groups of students to give some examples of the different contexts they can be used in. Finally, we end with a discussion to synthesise our experiences of using notebooks to educate students and their further potential, with considerations for education.

What is a Jupyter Notebook?

Jupyter Notebook is an open-source web application that runs in an internet browser. It allows the sharing of code, data analysis, data visualizations (which can be interactive), math formulas, and other embedded media (e.g., YouTube videos, images, and web links), all in a single document combining interactive and narrative components. This takes the form of a document that is composed of multiple cells that encapsulate the content of the notebook (Figure 1).


Fig1 Davies PLOSCompBio20 16-11.png

Fig. 1 A new Python 3 notebook with three empty cells denoted by the grey rectangles. The currently selected cell is highlighted in green.

Jupyter notebooks were created by Project Jupyter, which, according to their website, states that “Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.”[3] This includes various standards for interactive computing, including the notebook document format that is based on JavaScript Object Notation (JSON). The name Jupyter is composed of the initial three languages supported: Julia, Python, and R.[4]

Anatomy of a notebook

Jupyter notebooks are available in various programming languages, with current support for over 40 different programming languages.[3] These include the popular languages used for data science, such as Julia, Python, and R (Figure 2).


Fig2 Davies PLOSCompBio20 16-11.png

Fig. 2 A simple function that returns the value of the sum of two numbers showing different kernels (programming languages) in the notebooks; this example shows Python (left), Julia (middle), and R (right).

The notebooks are made up of units called “cells” that can be executed (run) in order to render their contents in different ways.

Cell types

There are two principle cell types. The first cell type is the “Markdown” cell, which is used to present text, images, equations, and other resources. The second cell type is the “code” cell, which allows the user to enter code written in a chosen programming language that will execute in the notebook. To execute the contents of any cell, the user can press the Shift and Enter keys together, or alternatively click on the “Run” button in the main menu bar across the top of the screen. If the cell being run is a code cell, it will cause the code in the cell to be executed and any output displayed immediately below it. This is indicated by the “In” and “Out” words located to the left of the cells, as seen in Fig 2.

Styling cells

Markdown cells can be styled with Markdown, which is a lightweight mark-up language for styling text.[5] This works by turning Markdown text into HTML (Figure 3).


Fig3 Davies PLOSCompBio20 16-11.png

Fig. 3 Example of a markdown cell (left) and the output of the styled cell when the cell is run (right).

These cells can also display plain text as output with no styling. Another useful feature for teaching math-based courses or sharing formulas, etc. is the integration of LaTeX support. LaTeX is a popular typesetting document preparation system [6] that was built on the Tex typesetting language, originally developed by the American computer scientist Donald Knuth.[6] LaTeX is widely used by the scientific community (e.g., computer scientists) to write academic publications (journal and conference papers). LaTeX math notation can be added to markdown cells to display formulas using common math notation. For example, the code below produces the output seen in Figure 4:

$ $

\sigma = \sqrt{\frac{1}{N}\sum_{i = 1}^{N} (x_i-\mu)^2}

$ $


Fig4 Davies PLOSCompBio20 16-11.png

Fig. 4 Output of LaTeX math notation producing the formula for the population standard deviation.

The LaTeX wikibook math section[7] is a useful resource for learning about the math notation options available in LaTeX. Table 1 provides an overview of some of the useful Python libraries for numerical and scientific computing that can be incorporated into the notebook environment.


Tab1 Davies PLOSCompBio20 16-11.png

Table 1. Some useful Python libraries for numerical and scientific computing.

Running Jupyter Notebooks

There are different ways of accessing Jupyter Notebooks. The Anaconda Distribution[8], a data science platform for Python and R, provides a free Python distribution, which includes Jupyter Notebooks. Another option includes JupyterHub[9], which is designed for groups of users to access notebooks on the cloud or locally hosted and maintained on their own devices. Once run, the user is greeted with a page showing the various files and folders available (Figure 5). Selecting the “new” option from the menu allows the user to create a new notebook in the selected language; alternatively, an existing notebook (ipynb) file can be loaded by selecting the required file from the list of files in the main list to the left of the screen.


Fig5 Davies PLOSCompBio20 16-11.png

Fig. 5 The files and folders tab seen when launching Jupyter notebooks locally. A new notebook is created by selecting the new dropdown option and choosing the required language.

Jupyter Notebooks, JupyterLab, and JupyterHub

Project Jupyter has created several resources and services surrounding the initial notebooks. This can sometimes cause some confusion among beginners. The difference among them are briefly described here.

  1. Jupyter Notebook is an interactive computational web application that combines code, text, data analysis, and other media in a single document.
  2. JupyterLab builds on the original Jupyter Notebook to provide an online interactive development environment that allows users to access notebooks with data and file viewers, text editors, and terminals all in the same environment. This helps to better integrate notebooks with other documents and resources in a single environment.
  3. JupyterHub allows multiple users (groups) to access notebooks and other resources. This can be useful for students and companies that want one or more groups to access and use a computational environment and resources without having to install and set things up. The management of these groups can be carried out by system administrators. Individual notebooks and the JupyterLab can be accessed via the Hub. The Hub can be run in the cloud or on a group's own hardware.

As these offerings build on the initial notebook and have notebooks at their core, this article describes the notebooks for beginners, rather than the additional platforms and services that incorporate them. Notebooks themselves work in a similar way regardless of being accessed alone or via JupyterLab or JupyterHub. It is worth being aware of these options, however, for building and sharing resources around the notebooks that you may develop.

Notebook extensions

A number of different “bolt on” extensions exists for the notebooks. These can be extremely useful for including additional features into a notebook. Some examples include the ability to split a cell into two different cells horizontally, a spellchecker, auto-numbering of equations, and an extension for making exercise tasks (discussed later). To enable and utilize the additional features that are available with the notebooks, the following commands should be entered into the command prompt (e.g., the Anaconda prompt or Powershell):

pip install jupyter_contrib_nbextensions

jupyter contrib nbextension install—user

pip install jupyter_nbextensions_configurator

jupyter nbextensions_configurator enable–user

This enables the “NBextensions” tab (Figure 6). When clicked on, the user is presented with a series of checkboxes for the various extensions. There is also a description, often with associated screenshots and/or animations previewing what the extension does.


Fig6 Davies PLOSCompBio20 16-11.png

Fig. 6 The NBextensions tab for selecting the various notebook extensions.

When a new notebook is opened, the selected extensions appear as small icon buttons under the main menu (Figure 7).


Fig7 Davies PLOSCompBio20 16-11.png

Fig. 7 Enabled notebook extension icons shown in red box.

Magic commands

References

  1. Gregory, J.; Salmon, G. (2013). "Professional development for online university teaching". Distance Education 34 (3): 256–70. doi:10.1080/01587919.2013.835771. 
  2. Georgina, D.A.; Olson, M.R. (2008). "Integration of technology in higher education: A review of faculty self-perceptions". The Internet and Higher Education 11 (1): 1–8. doi:10.1016/j.iheduc.2007.11.002. 
  3. 3.0 3.1 "Jupyter". Project Jupyter. https://jupyter.org/. Retrieved 27 January 2020. 
  4. Richardson, M.L.; Amini, B.. "Scientific Notebook Software: Applications for Academic Radiology". Current Problems in Diagnostic Radiology 47 (6): 368–77. doi:10.1067/j.cpradiol.2017.09.005. PMID 29122394. 
  5. Cone, M.. "Markdown Guide". https://www.markdownguide.org/. Retrieved 23 September 2019. 
  6. "An introduction to LaTeX". The LaTeX Project. https://www.latex-project.org/about/. Retrieved 27 January 2020. 
  7. Roberts, A.; Oetiker, T.; Partl, H. et al.. "LaTeX/Mathematics". LaTeX - WikiBooks. https://en.wikibooks.org/wiki/LaTeX/Mathematics. Retrieved 27 January 2020. 
  8. "Anaconda Distribution". Anaconda, Inc. 2019. Archived from the original on 06 September 2019. https://web.archive.org/web/20190906215334/https://www.anaconda.com/distribution/. Retrieved 21 September 2019. 
  9. "JupyterHub". Project Jupyter. https://jupyter.org/hub. Retrieved 27 January 2020. 

Notes

This presentation attempts to remain faithful to the original, with only a few minor changes to presentation. Grammar and punctuation has been updated reasonably to improve readability. In some cases important information was missing from the references, and that information was added. The original URL for the Anaconda Distribution is dead; an archived version of the URL was used for this version.