Difference between revisions of "Journal:Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system"

Full article title	Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system
Journal	The Quantitative Methods for Psychology
Author(s)	Sprengholz, Phillipp
Author affiliation(s)	Friedrich-Schiller-Universität Jena
Editors	Cousineau, Denis
Year published	2018
Volume and issue	14(2)
Page(s)	137–46
DOI	10.20982/tqmp.14.2.p137
ISSN	2292-1354
Distribution license	Creative Commons Attribution 4.0 International
Website	http://www.tqmp.org/RegularArticles/vol14-2/p137/
Download	https://www.tqmp.org/RegularArticles/vol14-2/p137/p137.pdf (PDF)

Revision as of 20:39, 18 June 2018

This article should not be considered complete until this message box has been removed. This is a work in progress.

Abstract

The reproduction of findings from psychological research has been proven difficult. Abstract description of the data analysis steps performed by researchers is one of the main reasons why reproducing or even understanding published findings is so difficult. With the introduction of Jupyter Notebook, a new tool for the organization of both static and dynamic information became available. The software allows blending explanatory content like written text or images with code for preprocessing and analyzing scientific data. Thus, Jupyter helps document the whole research process from ideation over data analysis to the interpretation of results. This fosters both collaboration and scientific quality by helping researchers to organize their work. This tutorial is an introduction to Jupyter. It explains how to set up and use the notebook system. While introducing its key features, the advantages of using Jupyter Notebook for psychological research become obvious.

Keywords: Reproducible research, interactive scientific computing, collaboration, notebook systems, data management

Introduction

The replicability of psychological research has been questioned increasingly.^[1]^[2]^[3] Reproducing or even understanding research findings requires extensive knowledge about the experimental manipulations and methods used.^[4] Unfortunately, many research publications fail in describing the research process in detail, are difficult to understand without background information, or facilitate misinterpretation.^[5] Most articles only include very abstract descriptions of data preparation and analysis steps, making it hard for the reader to follow up on. Consequently, reproducing results from psychological journals is practically impossible.^[6] The scientific community has tried to solve these problems by publishing supplemental information online. This includes raw data as well as detailed descriptions of data preprocessing and analysis steps. Unfortunately, this information is often organized in a confusing way.

That’s why a group of scientists developed Jupyter, a web application based on IPython.^[7] Jupyter enables users to create and share notebooks containing text, visualizations, equations, raw data, and code for analyzing and transforming this data. By blending static content like explanatory text and images with dynamic output of calculations and data analysis procedures, the notebooks emphasize the prose-first approach originally introduced by Mathematica Notebooks more than 20 years ago. The entire research process—including ideation, data acquisition, analysis, and interpretation of results—can be documented in a linear, story-like way. Publishing these notebooks alongside or instead of read-only journal articles may enhance both replication of results and collaboration between researchers.

This tutorial is written for readers with no previous experience using Jupyter. It explains how to set up and use Jupyter's notebooks for organizing, performing and documenting data analysis tasks common in psychological research. Jupyter supports more than 90 programming languages, thus enabling you to analyze data using scripts written in Python, R or virtually any other non-proprietary scripting language. However, this article will strictly focus on R. After setting up the system, an exemplary notebook will be created step by step.

Setting up Jupyter

Setting up Jupyter on your local computer includes three steps. First, Python needs to be installed, as it is required to run the notebook system. Afterwards, Jupyter is downloaded. Finally, R is installed and configured to work with Jupyter. All three steps are detailed in the following. Since most readers are assumed to work on Microsoft Windows, the explanations are tailored to this operating system. However, Jupyter can also be setup on both Mac OS and Linux, and the steps to perform the installation are nearly identical.

Step 1: Installing Python

Download the latest Python 3 installer from Python.org (current version is 3.6.4). When starting the installer, use default settings, but make sure Python is added to your system's path variable (see Figure 1).

Figure 1. Python installer

Step 2: Installing Jupyter

After Python has been installed, a command window needs to be opened. Press the Win + R keys on your keyboard, type cmd and press Enter. Afterwards, enter the following line into the command window and press Enter again: pip install jupyter

Step 3: Installing R and the R kernel

Download the latest R installer from R-Project.org (current version is 3.4.4). Make sure to select the base installation for Windows. Run the installer using default settings afterwards.

Finally, Jupyter must be interconnected with R by installing the R kernel. Open the R console by starting R.exe (to be found under C:\Program Files\R\R-3.4.3\bin). Copy the following command into the console window and press enter: install.packages(c(’repr’, ’IRdisplay’, ’evaluate’, ’crayon’, ’pbdZMQ’, ’devtools’, ’uuid’, ’digest’))

This downloads a set of packages required by the R kernel. You may be asked to create a personal library; respond with "yes." If you are asked to select a CRAN mirror, select a mirror close to your current location, as this accelerates the download. While retrieving the packages, several warnings may be printed in the console window. They can be ignored. After all packages have been downloaded, execute the following command in the R console: devtools::install_github(’IRkernel/ IRkernel’)

This installs the R kernel. Upcoming warning messages can be ignored again. Afterwards, we need to make sure Jupyter identifies the newly installed kernel. Therefore, its spec must be registered by executing the following command in the R console: IRkernel::installspec()

Starting Jupyter

Now we are ready to start Jupyter. Close the R console and open a new Windows command prompt as explained in Step 2. Type jupyter notebook, then press Enter. Starting the notebook system may take some seconds. Afterwards, a browser window opens, showing Jupyter’s homepage (see Figure 2). Congratulations, Jupyter has been set up successfully. In case you want to shut down the notebook system, simply close the command window. Whenever you want to start it up again, open a new command window and repeat execution of the jupyter notebook command.

Figure 2. Jupyter home screen

Creating and editing a notebook

When looking at Jupyter’s home screen, you will see your computer’s user directory. By default, the notebook app can only access files within this directory and any subfolders. Navigate to a place where you want to store your notebooks. You can create a new folder by clicking New → Folder and renaming it afterwards by selecting it and clicking Rename. After choosing a folder, create a new notebook in there by clicking New → Notebook: R. A new browser window opens, showing the empty notebook you just created. Each notebook is made of vertically ordered cells holding either explanatory content or code. The input of each cell can be interpreted (i.e., run) by Jupyter, leading to a well-formatted output. Figure 3 shows an example. As we can see on the left side, this notebook contains multiple cells. When running them (by pressing the play button at the top of the page), they are rendered as shown on the right side.

Figure 3. Example notebook

In our empty notebook we can easily create new cells by clicking the plus button. Before filling the cells, we have to decide about the type of content. Each cell can be a Markdown or code cell. You can change the cell type by clicking Cell → Cell Type in the menu. To get a deeper understanding about the two types, we will use our recently created notebook for the analysis of exemplary Big Five personality data to be retrieved from the Personality Project. First, we will note down some conceptual basics using Markdown cells. Afterwards we will load the data and analyze it using code cells. As the algorithms may require some explanation, code cells should alternate with describing Markdown cells. The final result can be previewed and downloaded here. Before entering the first cell, let’s change the name of our empty notebook. Click on the title at the top of the page and change it to something like Working with Personality Data.

Markdown cells

Markdown cells are used for explanatory static content like text, images, and mathematical expressions. The content is styled and formatted by using the popular Markdown syntax. It is also possible to use HTML commands. Furthermore, mathematical expressions can be added to Markdown cells using LaTeX expressions. When Markdown cells are interpreted, their content is formatted by Jupyter and presented in an easy-to-read way. In summary, Markdown cells can be used to achieve a presentation of static content comparable to current psychological journal publications. Let’s have a closer look at some examples.

Heading, bold, and italic text

Headings can be used to structure texts. In Markdown, a heading has to be in its own line and preceded by hashtags (#). The amount of hashtags defines the outline level of a heading. Text can be decorated using bold or italic style. Letters, words, or groups of words surrounded by a single asterisk (*) are printed in italics, whereas using two asterisks (**) causes bold printing. Try to add the content shown on the left side of Figure 4 to your first Markdown cell. After running the cell, you should see a well-formatted output containing headings, as well as bold and italic text, as shown on the right side of the figure.

Figure 4. Formatting headings, bold, and italic content

Links and images

Links to websites or external data can be added to Markdown cells too. Simply surround the link’s name in square brackets, followed by the target address in round brackets. You can also include images to your notebook. To do that, use an exclamation mark (!), followed by an image title in square brackets and the image’s address in round brackets. If you want to show an image that is stored in the same location as your notebook, you do not need to provide its full address. Instead, you can just use its filename. Add another Markdown cell to your notebook containing the text from Figure 5. When running this cell, you should see both a link and an image of the Big Five retrieved from Wikimedia Commons.

Figure 5. Formatting links and images

Lists and tables

Markdown supports both numbered and unnumbered lists. Starting a new line of text with a number and a dot (1.) defines an item of a numbered list. Using a hyphen (-) instead defines an item of an unnumbered list. Tables can be rendered too, using a more complex syntax. Figure 6 shows an example. When you copy the text from the left side into a new Markdown cell, a table containing exemplary traits will be printed after running the cell. Furthermore, an unnumbered list of exemplary items will be rendered.

Figure 6. Formatting lists and tables

References

↑ Klein, R.A.; Ratliff, K.A.; Vianello, M. et al. (2014). "Investigating Variation in Replicability: A “Many Labs” Replication Project". Social Psychology 45 (3): 142–52. doi:10.1027/1864-9335/a000178.
↑ Pashler, H.; Wagenmakers, E.J. (2012). "Editors' Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence?". Perspectives on Psychological Science 7 (6): 528-30. doi:10.1177/1745691612465253. PMID 26168108.
↑ Yong, E. (2012). "Replication studies: Bad copy". Nature 485 (7398): 298-300. doi:10.1038/485298a. PMID 22596136.
↑ Nosek, B.A.; Spies, J.R.; Motyl, M. (2012). "Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability". Perspectives on Psychological Science 7 (6): 615-31. doi:10.1177/1745691612459058. PMID 26168121.
↑ Donoho, D.L.; Maleki, A.; Rahman, I.U. et al. (2009). "Reproducible Research in Computational Harmonic Analysis". Computing in Science & Engineering 11 (1): 8-18. doi:10.1109/MCSE.2009.15.
↑ Shen, H. (2014). "Interactive notebooks: Sharing the code". Nature 515 (7525): 151–2. doi:10.1038/515151a. PMID 25373681.
↑ Perez, F.; Granger, B.E. (2007). "IPython: A System for Interactive Scientific Computing". Computing in Science & Engineering 9 (3): 21–9. doi:10.1109/MCSE.2007.53.

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists reference alphabetically; this version lists them in order of appearance, by design.

[KleinInvest14-1] Klein, R.A.; Ratliff, K.A.; Vianello, M. et al. (2014). "Investigating Variation in Replicability: A “Many Labs” Replication Project". Social Psychology 45 (3): 142–52. doi:10.1027/1864-9335/a000178.

[PashlerEditors12-2] Pashler, H.; Wagenmakers, E.J. (2012). "Editors' Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence?". Perspectives on Psychological Science 7 (6): 528-30. doi:10.1177/1745691612465253. PMID 26168108.

[YongReplication12-3] Yong, E. (2012). "Replication studies: Bad copy". Nature 485 (7398): 298-300. doi:10.1038/485298a. PMID 22596136.

[NosekScientific12-4] Nosek, B.A.; Spies, J.R.; Motyl, M. (2012). "Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability". Perspectives on Psychological Science 7 (6): 615-31. doi:10.1177/1745691612459058. PMID 26168121.

[DonohoRepro09-5] Donoho, D.L.; Maleki, A.; Rahman, I.U. et al. (2009). "Reproducible Research in Computational Harmonic Analysis". Computing in Science & Engineering 11 (1): 8-18. doi:10.1109/MCSE.2009.15.

[ShenInteractive14-6] Shen, H. (2014). "Interactive notebooks: Sharing the code". Nature 515 (7525): 151–2. doi:10.1038/515151a. PMID 25373681.

[PerezIPython07-7] Perez, F.; Granger, B.E. (2007). "IPython: A System for Interactive Scientific Computing". Computing in Science & Engineering 9 (3): 21–9. doi:10.1109/MCSE.2007.53.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

@@ Line 79: / Line 79: @@
   |-
    | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 2.''' Jupyter home screen</blockquote>
+ |-
+|}
+|}
+==Creating and editing a notebook==
+When looking at Jupyter’s home screen, you will see your computer’s user directory. By default, the notebook app can only access files within this directory and any subfolders. Navigate to a place where you want to store your notebooks. You can create a new folder by clicking New → Folder and renaming it afterwards by selecting it and clicking Rename. After choosing a folder, create a new notebook in there by clicking New → Notebook: R. A new browser window opens, showing the empty notebook you just created. Each notebook is made of vertically ordered cells holding either explanatory content or code. The input of each cell can be interpreted (i.e., run) by Jupyter, leading to a well-formatted output. Figure 3 shows an example. As we can see on the left side, this notebook contains multiple cells. When running them (by pressing the play button at the top of the page), they are rendered as shown on the right side.
+[[File:Fig3 SprengholzQuantMethSci2018 14-2.png|990px]]
+{{clear}}
+{|
+ | STYLE="vertical-align:top;"|
+{| border="0" cellpadding="5" cellspacing="0" width="990px"
+ |-
+  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 3.''' Example notebook</blockquote>
+ |-
+|}
+|}
+In our empty notebook we can easily create new cells by clicking the plus button. Before filling the cells, we have to decide about the type of content. Each cell can be a Markdown or code cell. You can change the cell type by clicking Cell → Cell Type in the menu. To get a deeper understanding about the two types, we will use our recently created notebook for the analysis of exemplary Big Five personality data to be retrieved from the Personality Project. First, we will note down some conceptual basics using Markdown cells. Afterwards we will load the data and analyze it using code cells. As the algorithms may require some explanation, code cells should alternate with describing Markdown cells. The final result can be previewed and downloaded here. Before entering the first cell, let’s change the name of our empty notebook. Click on the title at the top of the page and change it to something like ''Working with Personality Data''.
+===Markdown cells===
+Markdown cells are used for explanatory static content like text, images, and mathematical expressions. The content is styled and formatted by using the popular Markdown syntax. It is also possible to use HTML commands. Furthermore, mathematical expressions can be added to Markdown cells using LaTeX expressions. When Markdown cells are interpreted, their content is formatted by Jupyter and presented in an easy-to-read way. In summary, Markdown cells can be used to achieve a presentation of static content comparable to current psychological journal publications. Let’s have a closer look at some examples.
+====Heading, bold, and italic text====
+Headings can be used to structure texts. In Markdown, a heading has to be in its own line and preceded by hashtags (#). The amount of hashtags defines the outline level of a heading. Text can be decorated using bold or italic style. Letters, words, or groups of words surrounded by a single asterisk (*) are printed in italics, whereas using two asterisks (**) causes bold printing. Try to add the content shown on the left side of Figure 4 to your first Markdown cell. After running the cell, you should see a well-formatted output containing headings, as well as bold and italic text, as shown on the right side of the figure.
+[[File:Fig4 SprengholzQuantMethSci2018 14-2.png|846px]]
+{{clear}}
+{|
+ | STYLE="vertical-align:top;"|
+{| border="0" cellpadding="5" cellspacing="0" width="846px"
+ |-
+  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 4.''' Formatting headings, bold, and italic content</blockquote>
+ |-
+|}
+|}
+====Links and images====
+Links to websites or external data can be added to Markdown cells too. Simply surround the link’s name in square brackets, followed by the target address in round brackets. You can also include images to your notebook. To do that, use an exclamation mark (!), followed by an image title in square brackets and the image’s address in round brackets. If you want to show an image that is stored in the same location as your notebook, you do not need to provide its full address. Instead, you can just use its filename. Add another Markdown cell to your notebook containing the text from Figure 5. When running this cell, you should see both a link and an image of the Big Five retrieved from Wikimedia Commons.
+[[File:Fig5 SprengholzQuantMethSci2018 14-2.png|830px]]
+{{clear}}
+{|
+ | STYLE="vertical-align:top;"|
+{| border="0" cellpadding="5" cellspacing="0" width="830px"
+ |-
+  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 5.''' Formatting links and images</blockquote>
+ |-
+|}
+|}
+====Lists and tables====
+Markdown supports both numbered and unnumbered lists. Starting a new line of text with a number and a dot (1.) defines an item of a numbered list. Using a hyphen (-) instead defines an item of an unnumbered list. Tables can be rendered too, using a more complex syntax. Figure 6 shows an example. When you copy the text from the left side into a new Markdown cell, a table containing exemplary traits will be printed after running the cell. Furthermore, an unnumbered list of exemplary items will be rendered.
+[[File:Fig6 SprengholzQuantMethSci2018 14-2.png|634px]]
+{{clear}}
+{|
+ | STYLE="vertical-align:top;"|
+{| border="0" cellpadding="5" cellspacing="0" width="634px"
+ |-
+  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 6.''' Formatting lists and tables</blockquote>
   |-
 |}

Difference between revisions of "Journal:Welcome to Jupyter: Improving collaboration and reproduction in psychological research by using a notebook system"

Revision as of 20:39, 18 June 2018

Contents

Abstract

Introduction

Setting up Jupyter

Step 1: Installing Python

Step 2: Installing Jupyter

Step 3: Installing R and the R kernel

Starting Jupyter

Creating and editing a notebook

Markdown cells

Heading, bold, and italic text

Links and images

Lists and tables

References

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Popular publications

Print/export