Difference between revisions of "Journal:GitHub as an open electronic laboratory notebook for real-time sharing of knowledge and collaboration"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 56: Line 56:


==GitHub==
==GitHub==
GitHub is a web-based graphical interface for Git, an open-source [[version control]] system. It was originally designed for software developers to work collaboratively on open-source code; however, in recent years the GitHub community has expanded. After software development, education, and data, science now represents the fourth largest category of users. [22] Examples range from [[machine learning]] programs like the tuberculosis and lung cancer screening initiative AiAi.care Project, to organic chemistry applications, including reaction visualizers, spectroscopic databases, and chemistry learning tools. Open Source Malaria, Open Source Mycetoma, Open Source Tuberculosis, and [https://github.com/opensourceantibiotics Open Source Antibiotics] represent four examples of open-source drug discovery hosted on the platform, making use of GitHub's forum-like structure to facilitate open, real-time collaboration and discussion among teams of scientists all around the world.


The version control enabled by Git is directly transferable to ELNs. Importantly, for the validity and verifiability of scientific research, using Git enables users to keep track of the who, what, when and even why: when saving changes, GitHub offer the option to provide a short description of what was changed and why the change was made. This record-keeping enables greater transparency, making it easy to see if an edit was made to fix typos, add information, or alter data, and it's crucial in maintaining [[data integrity]] and preventing misunderstanding or misuse of data. [23] Furthermore, all activities are attributed to the user via their display name, bestowing a level of accountability and responsibility, while also ensuring that contributors receive attribution for their work.


A number of user interfaces (UIs) for Git exist, including GitHub, [[GitLab]], and Gitea, each offering slightly different user experiences. Each can be used as an ELN, as described in this article; however, GitHub is more openly accessibly and offers additional UI features (e.g., a Discussions tab for public discourse), making it more suitable for hosting open-source and collaborative projects.
GitHub's accessibility is also important to the open science ethos. No account or subscription is required to view work within a public GitHub repository, allowing people to access data without concerns of cost or association with institutions. Through a standard internet browser, anyone can view content as soon as it is published without the researcher needing to “share” their work, or the reader having to access any proprietary products. In contrast, to view content on GitLab it is required to have an account and be signed in, while Gitea is a self-hosted UI.
Not only is the content on GitHub openly accessible, but users can connect to content on GitHub in different ways: from the web-based site, desktop app, or mobile app. The mobile app is available for Android and iOS and is easy to use on a standard smartphone or tablet. Many popular ELNs are primarily laptop-based [23], and while no research has yet specifically examined the use of mobile apps for ELNs, we envision that this mode of access will improve record-keeping in [[laboratory]] settings due to the ease of access, portability, and ubiquity of mobile devices. Most, if not all, researchers are able to access the GitHub app on their device to swiftly read through past methods, add details and observations in the moment, or snap a photo for the ELN. A similar sentiment has been expressed by others who suggest that many researchers are likely to prefer mobile-based ELNs for their portability and extra features, like the built-in camera and option to annotate images using a stylus. [24,25]
We have used GitHub repositories as an ELN for both laboratory-based synthetic projects and computer-based social science projects, and we describe our experiences using it in the synthetic chemistry laboratory as a case study below.
==ELN structure and utility==
At the top layer, GitHub uses repositories to organize and store data and information. Each repository has Code, Issues, Discussions, Projects and Wiki tabs, all of which contribute to the ELN workflow (Fig. 2). Repositories also contain Pull Requests, Actions, and Insight tabs, which are currently not used in our ELN workflow, along with Security and Settings tabs which are not discussed here.
[[File:Fig2 Scroggie DigDisc2023 2.gif|800px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 2.''' Overview of a GitHub ELN repository and its structure and utility.</blockquote>
|-
|}
|}
Repositories can be set up either by an individual or an organization (e.g., research group) and assigned to individuals. Within Open Source Mycetoma and Open Source Tuberculosis there are topic-specific repositories which support discussion and collaboration, while ELN repositories are created by individuals and linked to the relevant organization's repositories. This gives researchers the freedom to organize ELN repositories in a way that suits their individual needs. For example, while multiple projects can be contained in a single repository, a researcher may choose to have multiple repositories, one for each project they are involved in. Alternatively, a research group could set up repositories for each project with all researchers working on the project contributing to the single repository. Either way, an overview of all repositories can be viewed on both the individual's and organization's profile. This interconnectivity of related work and segregation of distinct topics makes GitHub a useful tool, not only as an ELN, but also as a platform for the presentation of research and collaboration.
===Notebook pages===
The Issues tab is used to house notebook pages: each represents an individual experiment and contains all the essential information, including the title, aim, quantity of reagents, methods, results, discussion, and conclusions, along with linked references. In creating a new issue, the title, hyperlink to the risk assessment, reaction scheme (uploaded as an image), and table of reagents are posted. Plain text is formatted using Markdown, creating headings, tables, and hyperlinks to aid clarity and readability. Making use of the forum-like structure, each subsequent addition to the notebook page is posted as a comment and conveniently time stamped. All experimental, observational, and analytical data are also uploaded to the relevant issue. Once an experiment is completed, the issue is "closed." This keeps the Issues landing page free of clutter and "open" issues (active experiments) easily accessible, as open and closed issues are segregated. Examples of a typical Issues landing page and notebook page are shown in Fig. 3 and 4, respectively.
[[File:Fig3 Scroggie DigDisc2023 2.gif|800px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3.''' An example of the Issues landing page showing four recent experiments.</blockquote>
|-
|}
|}
[[File:Fig4 Scroggie DigDisc2023 2.gif|800px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 4.''' Example of a new issue as ELN entry, including the title, hyperlink to the risk assessment, reaction scheme and table of reagents formatted using Markdown. A completed experiment example can be found on https://github.com/TheBreakingGoodProject/ELN-Kymberley-Scroggie/issues/57.</blockquote>
|-
|}
|}
==References==
==References==
{{Reflist|colwidth=30em}}
{{Reflist|colwidth=30em}}

Revision as of 00:34, 10 January 2024

Full article title GitHub as an open electronic laboratory notebook for real-time sharing of knowledge and collaboration
Journal Digital Discovery
Author(s) Scroggie, Kymberley R.; Burell-Sander, Klementine J.; Rutledge, Peter J.; Motion, Alice
Author affiliation(s) University of Sydney
Primary contact Email: alice dot motion at sydney dot edu dot au
Year published 2023
Volume and issue 2
Page(s) 1188-1196
DOI 10.1039/D3DD00032J
ISSN 2635-098X
Distribution license Creative Commons Attribution-NonCommercial 3.0 Unported
Website https://pubs.rsc.org/en/content/articlehtml/2023/dd/d3dd00032j
Download https://pubs.rsc.org/en/content/articlepdf/2023/dd/d3dd00032j (PDF)

Abstract

Electronic laboratory notebooks (ELNs) have expanded the utility of the paper laboratory notebook beyond that of a simple record keeping tool. Open ELNs offer additional benefits to the scientific community, including increased transparency, reproducibility, and integrity. A key element underpinning these benefits is facile and expedient knowledge sharing which aids communication and collaboration. In previous projects, we have used LabTrove and LabArchives as open ELNs, in partnership with GitHub (an open-source web-based platform originally developed for collaborative coding) for communication and discussion. Here we present our personal experiences using GitHub as the central platform for many aspects of the scientific process, including version-controlled recording of experiments, results and interpretation, data storage, project management, workflows, communication, and collaboration. We report on the utility of GitHub as an open ELN for chemistry research, and we discuss our experiences employing it with the Open Source Mycetoma and Open Source Tuberculosis consortia. By outlining its features and shortcomings through their implementation in our work, we demonstrate how using GitHub as a central platform can aid the real-time sharing of knowledge and collaboration, and further democratize scientific research within both open and traditional research models.

Keywords: electronic laboratory notebook, ELN, GitHub, data sharing, knowledge sharing, chemistry, open research

Introduction

Technological advances have allowed scientists to move beyond the primitive utility of the paper laboratory notebook as a record-keeping tool. In 1994, Borman noted that electronic laboratory notebooks (ELNs) “could revolutionize how scientists record their research, manage their data, and share their information with others.” [1] ELNs have indeed been integrated into laboratory information management systems (LIMS) and electronic laboratory environments (ELEs), but they have also revolutionized the way in which scientists disseminate knowledge, particularly through the internet.

ELNs enable knowledge sharing, facilitating faster transfer of knowledge and collaboration, which in turn expedites future knowledge generation and improves research efficiency. [2,3] The digital storage of information further increases efficiency with greater longevity, readability, and searchability. Despite these benefits, the shift away from paper to electronic has been an evolutionary process rather than a revolutionary one, and scientists—particularly those in academia—have been slow to accept and adopt ELNs. [4]

The ability of scientists to move to electronic documentation of their work with minimal disruption has been identified as the key factor for broader acceptance of ELNs in an academic setting. [5] However, the highly diverse nature of different disciplines within academia leads to a broad range of specific needs that require highly specialized or custom ELNs to affect a seamless transition. While some commercial ELNs can support many specialized requirements, their licensing and maintenance costs often put them out of reach for individual academic research groups. [6,7] Instead, many have made use of generic, freely available platforms such as OneNote [8], EverNote [9,10], or Google Docs [11], with others developing their own ELNs to reap the specific benefits they require. [12–14]

We at the University of Sydney have successfully used several different ELNs for our own work as part of different open-source drug discovery consortia, including Open Source Malaria [15], Open Source Mycetoma [16], and Open Source Tuberculosis. Open-source drug discovery is a new approach to drug discovery in which all aspects of research are shared publicly and in real-time (i.e., immediately as it is produced) to facilitate collaboration and knowledge sharing. [17] These consortia follow the principles of open science, in which scientific knowledge is developed collaboratively and made freely accessible to any interested parties [18], and more specifically Todd's Six Laws of Open Science.[19]

In line with openly sharing our research, we have hosted ELNs on the open-source software platform LabTrove [20] and the commercial ELN LabArchives,) while simultaneously using GitHub to support discussion and collaboration. To bring together the sharing of knowledge and collaboration into a single open and central location, we have now explored the use of GitHub itself as the ELN (Fig. 1). Using GitHub as both an ELN and a hub for instant communication elevates it to the status of a “collaboratory” as envisioned by Wulf, as a “centre without walls, in which the nation's researchers can perform their research without regard to geographical location, interacting with colleagues, accessing instrumentation, sharing data and computational resource, and accessing information in digital libraries.”. [21]

This article draws on the experiences of two of the authors using GitHub as an ELN for various synthetic chemistry projects and provides preliminary findings into its usability. We report on the utility of GitHub as an open ELN, detail its features in this dimension, and discuss its implementation for open-source drug discovery. We also share an ELN template GitHub repository for those considering alternative ELNs. While we have used GitHub as an open ELN, and repositories are open by default, we note that for projects that require confidentiality or follow a traditional research methodology, information and data can be held within closed repositories with access limited to only invited users.


Fig1 Scroggie DigDisc2023 2.gif

Figure 1. How we share scientific data and knowledge with the community in real-time.

GitHub

GitHub is a web-based graphical interface for Git, an open-source version control system. It was originally designed for software developers to work collaboratively on open-source code; however, in recent years the GitHub community has expanded. After software development, education, and data, science now represents the fourth largest category of users. [22] Examples range from machine learning programs like the tuberculosis and lung cancer screening initiative AiAi.care Project, to organic chemistry applications, including reaction visualizers, spectroscopic databases, and chemistry learning tools. Open Source Malaria, Open Source Mycetoma, Open Source Tuberculosis, and Open Source Antibiotics represent four examples of open-source drug discovery hosted on the platform, making use of GitHub's forum-like structure to facilitate open, real-time collaboration and discussion among teams of scientists all around the world.

The version control enabled by Git is directly transferable to ELNs. Importantly, for the validity and verifiability of scientific research, using Git enables users to keep track of the who, what, when and even why: when saving changes, GitHub offer the option to provide a short description of what was changed and why the change was made. This record-keeping enables greater transparency, making it easy to see if an edit was made to fix typos, add information, or alter data, and it's crucial in maintaining data integrity and preventing misunderstanding or misuse of data. [23] Furthermore, all activities are attributed to the user via their display name, bestowing a level of accountability and responsibility, while also ensuring that contributors receive attribution for their work.

A number of user interfaces (UIs) for Git exist, including GitHub, GitLab, and Gitea, each offering slightly different user experiences. Each can be used as an ELN, as described in this article; however, GitHub is more openly accessibly and offers additional UI features (e.g., a Discussions tab for public discourse), making it more suitable for hosting open-source and collaborative projects.

GitHub's accessibility is also important to the open science ethos. No account or subscription is required to view work within a public GitHub repository, allowing people to access data without concerns of cost or association with institutions. Through a standard internet browser, anyone can view content as soon as it is published without the researcher needing to “share” their work, or the reader having to access any proprietary products. In contrast, to view content on GitLab it is required to have an account and be signed in, while Gitea is a self-hosted UI.

Not only is the content on GitHub openly accessible, but users can connect to content on GitHub in different ways: from the web-based site, desktop app, or mobile app. The mobile app is available for Android and iOS and is easy to use on a standard smartphone or tablet. Many popular ELNs are primarily laptop-based [23], and while no research has yet specifically examined the use of mobile apps for ELNs, we envision that this mode of access will improve record-keeping in laboratory settings due to the ease of access, portability, and ubiquity of mobile devices. Most, if not all, researchers are able to access the GitHub app on their device to swiftly read through past methods, add details and observations in the moment, or snap a photo for the ELN. A similar sentiment has been expressed by others who suggest that many researchers are likely to prefer mobile-based ELNs for their portability and extra features, like the built-in camera and option to annotate images using a stylus. [24,25]

We have used GitHub repositories as an ELN for both laboratory-based synthetic projects and computer-based social science projects, and we describe our experiences using it in the synthetic chemistry laboratory as a case study below.

ELN structure and utility

At the top layer, GitHub uses repositories to organize and store data and information. Each repository has Code, Issues, Discussions, Projects and Wiki tabs, all of which contribute to the ELN workflow (Fig. 2). Repositories also contain Pull Requests, Actions, and Insight tabs, which are currently not used in our ELN workflow, along with Security and Settings tabs which are not discussed here.


Fig2 Scroggie DigDisc2023 2.gif

Figure 2. Overview of a GitHub ELN repository and its structure and utility.

Repositories can be set up either by an individual or an organization (e.g., research group) and assigned to individuals. Within Open Source Mycetoma and Open Source Tuberculosis there are topic-specific repositories which support discussion and collaboration, while ELN repositories are created by individuals and linked to the relevant organization's repositories. This gives researchers the freedom to organize ELN repositories in a way that suits their individual needs. For example, while multiple projects can be contained in a single repository, a researcher may choose to have multiple repositories, one for each project they are involved in. Alternatively, a research group could set up repositories for each project with all researchers working on the project contributing to the single repository. Either way, an overview of all repositories can be viewed on both the individual's and organization's profile. This interconnectivity of related work and segregation of distinct topics makes GitHub a useful tool, not only as an ELN, but also as a platform for the presentation of research and collaboration.

Notebook pages

The Issues tab is used to house notebook pages: each represents an individual experiment and contains all the essential information, including the title, aim, quantity of reagents, methods, results, discussion, and conclusions, along with linked references. In creating a new issue, the title, hyperlink to the risk assessment, reaction scheme (uploaded as an image), and table of reagents are posted. Plain text is formatted using Markdown, creating headings, tables, and hyperlinks to aid clarity and readability. Making use of the forum-like structure, each subsequent addition to the notebook page is posted as a comment and conveniently time stamped. All experimental, observational, and analytical data are also uploaded to the relevant issue. Once an experiment is completed, the issue is "closed." This keeps the Issues landing page free of clutter and "open" issues (active experiments) easily accessible, as open and closed issues are segregated. Examples of a typical Issues landing page and notebook page are shown in Fig. 3 and 4, respectively.


Fig3 Scroggie DigDisc2023 2.gif

Figure 3. An example of the Issues landing page showing four recent experiments.

Fig4 Scroggie DigDisc2023 2.gif

Figure 4. Example of a new issue as ELN entry, including the title, hyperlink to the risk assessment, reaction scheme and table of reagents formatted using Markdown. A completed experiment example can be found on https://github.com/TheBreakingGoodProject/ELN-Kymberley-Scroggie/issues/57.

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.