Journal:Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system

From LIMSWiki
Revision as of 19:20, 18 June 2018 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system
Journal The Quantitative Methods for Psychology
Author(s) Sprengholz, Phillipp
Author affiliation(s) Friedrich-Schiller-Universität Jena
Editors Cousineau, Denis
Year published 2018
Volume and issue 14(2)
Page(s) 137–46
DOI 10.20982/tqmp.14.2.p137
ISSN 2292-1354
Distribution license Creative Commons Attribution 4.0 International
Website http://www.tqmp.org/RegularArticles/vol14-2/p137/
Download https://www.tqmp.org/RegularArticles/vol14-2/p137/p137.pdf (PDF)

Abstract

The reproduction of findings from psychological research has been proven difficult. Abstract description of the data analysis steps performed by researchers is one of the main reasons why reproducing or even understanding published findings is so difficult. With the introduction of Jupyter Notebook, a new tool for the organization of both static and dynamic information became available. The software allows blending explanatory content like written text or images with code for preprocessing and analyzing scientific data. Thus, Jupyter helps document the whole research process from ideation over data analysis to the interpretation of results. This fosters both collaboration and scientific quality by helping researchers to organize their work. This tutorial is an introduction to Jupyter. It explains how to set up and use the notebook system. While introducing its key features, the advantages of using Jupyter Notebook for psychological research become obvious.

Keywords: Reproducible research, interactive scientific computing, collaboration, notebook systems, data management

Introduction

The replicability of psychological research has been questioned increasingly.[1][2][3] Reproducing or even understanding research findings requires extensive knowledge about the experimental manipulations and methods used.[4] Unfortunately, many research publications fail in describing the research process in detail, are difficult to understand without background information, or facilitate misinterpretation.[5] Most articles only include very abstract descriptions of data preparation and analysis steps, making it hard for the reader to follow up on. Consequently, reproducing results from psychological journals is practically impossible.[6] The scientific community has tried to solve these problems by publishing supplemental information online. This includes raw data as well as detailed descriptions of data preprocessing and analysis steps. Unfortunately, this information is often organized in a confusing way.

That’s why a group of scientists developed Jupyter, a web application based on IPython.[7] Jupyter enables users to create and share notebooks containing text, visualizations, equations, raw data, and code for analyzing and transforming this data. By blending static content like explanatory text and images with dynamic output of calculations and data analysis procedures, the notebooks emphasize the prose-first approach originally introduced by Mathematica Notebooks more than 20 years ago. The entire research process—including ideation, data acquisition, analysis, and interpretation of results—can be documented in a linear, story-like way. Publishing these notebooks alongside or instead of read-only journal articles may enhance both replication of results and collaboration between researchers.

This tutorial is written for readers with no previous experience using Jupyter. It explains how to set up and use Jupyter's notebooks for organizing, performing and documenting data analysis tasks common in psychological research. Jupyter supports more than 90 programming languages, thus enabling you to analyze data using scripts written in Python, R or virtually any other non-proprietary scripting language. However, this article will strictly focus on R. After setting up the system, an exemplary notebook will be created step by step.

Setting up Jupyter

Setting up Jupyter on your local computer includes three steps. First, Python needs to be installed, as it is required to run the notebook system. Afterwards, Jupyter is downloaded. Finally, R is installed and configured to work with Jupyter. All three steps are detailed in the following. Since most readers are assumed to work on Microsoft Windows, the explanations are tailored to this operating system. However, Jupyter can also be setup on both Mac OS and Linux, and the steps to perform the installation are nearly identical.

Step 1: Installing Python

Download the latest Python 3 installer from Python.org (current version is 3.6.4). When starting the installer, use default settings, but make sure Python is added to your system's path variable (see Figure 1).


Fig1 SprengholzQuantMethSci2018 14-2.png

Figure 1. Python installer

Step 2: Installing Jupyter

After Python has been installed, a command window needs to be opened. Press the Win + R keys on your keyboard, type cmd and press Enter. Afterwards, enter the following line into the command window and press Enter again: pip install jupyter

Step 3: Installing R and the R kernel

Download the latest R installer from R-Project.org (current version is 3.4.4). Make sure to select the base installation for Windows. Run the installer using default settings afterwards.

Finally, Jupyter must be interconnected with R by installing the R kernel. Open the R console by starting R.exe (to be found under C:\Program Files\R\R-3.4.3\bin). Copy the following command into the console window and press enter: install.packages(c(’repr’, ’IRdisplay’, ’evaluate’, ’crayon’, ’pbdZMQ’, ’devtools’, ’uuid’, ’digest’))

This downloads a set of packages required by the R kernel. You may be asked to create a personal library; respond with "yes." If you are asked to select a CRAN mirror, select a mirror close to your current location, as this accelerates the download. While retrieving the packages, several warnings may be printed in the console window. They can be ignored. After all packages have been downloaded, execute the following command in the R console: devtools::install_github(’IRkernel/ IRkernel’)

This installs the R kernel. Upcoming warning messages can be ignored again. Afterwards, we need to make sure Jupyter identifies the newly installed kernel. Therefore, its spec must be registered by executing the following command in the R console: IRkernel::installspec()

Starting Jupyter

Now we are ready to start Jupyter. Close the R console and open a new Windows command prompt as explained in Step 2. Type jupyter notebook, then press Enter. Starting the notebook system may take some seconds. Afterwards, a browser window opens, showing Jupyter’s homepage (see Figure 2). Congratulations, Jupyter has been set up successfully. In case you want to shut down the notebook system, simply close the command window. Whenever you want to start it up again, open a new command window and repeat execution of the jupyter notebook command.


Fig2 SprengholzQuantMethSci2018 14-2.png

Figure 2. Jupyter home screen


References

  1. Klein, R.A.; Ratliff, K.A.; Vianello, M. et al. (2014). "Investigating Variation in Replicability: A “Many Labs” Replication Project". Social Psychology 45 (3): 142–52. doi:10.1027/1864-9335/a000178. 
  2. Pashler, H.; Wagenmakers, E.J. (2012). "Editors' Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence?". Perspectives on Psychological Science 7 (6): 528-30. doi:10.1177/1745691612465253. PMID 26168108. 
  3. Yong, E. (2012). "Replication studies: Bad copy". Nature 485 (7398): 298-300. doi:10.1038/485298a. PMID 22596136. 
  4. Nosek, B.A.; Spies, J.R.; Motyl, M. (2012). "Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability". Perspectives on Psychological Science 7 (6): 615-31. doi:10.1177/1745691612459058. PMID 26168121. 
  5. Donoho, D.L.; Maleki, A.; Rahman, I.U. et al. (2009). "Reproducible Research in Computational Harmonic Analysis". Computing in Science & Engineering 11 (1): 8-18. doi:10.1109/MCSE.2009.15. 
  6. Shen, H. (2014). "Interactive notebooks: Sharing the code". Nature 515 (7525): 151–2. doi:10.1038/515151a. PMID 25373681. 
  7. Perez, F.; Granger, B.E. (2007). "IPython: A System for Interactive Scientific Computing". Computing in Science & Engineering 9 (3): 21–9. doi:10.1109/MCSE.2007.53. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists reference alphabetically; this version lists them in order of appearance, by design.