Journal:Chemotion ELN: An open-source electronic lab notebook for chemists in academia

From LIMSWiki
Revision as of 18:36, 2 November 2017 by Shawndouglas (talk | contribs) (Finished adding rest of content.)
Jump to navigationJump to search
Full article title Chemotion ELN: An open-source electronic lab notebook for chemists in academia
Journal Journal of Chemoinformatics
Author(s) Tremouilhac, Pierre; Nguyen, An; Huang, Yu-Chieh; Kotov, Serhii; Lütjohann, Dominic Sebastian; Hübsch, Florian; Jung, Nicole; Bräse, Stefan
Author affiliation(s) Karlsruhe Institute of Technology, Cubuslab GmbH, Ninja-Concept GmbH
Primary contact Email: nicole dot jung at kit dot edu
Year published 2017
Volume and issue 9
Page(s) 54
DOI 10.1186/s13321-017-0240-0
ISSN 1758-2946
Distribution license Creative Commons Attribution 4.0 International
Website https://jcheminf.springeropen.com/articles/10.1186/s13321-017-0240-0
Download https://jcheminf.springeropen.com/track/pdf/10.1186/s13321-017-0240-0 (PDF)

Abstract

The development of an electronic laboratory notebook (ELN) for researchers working in the field of chemical sciences is presented. The web-based application is available as open-source software that offers modern solutions for chemical researchers. The Chemotion ELN is equipped with the basic functionalities necessary for the acquisition and processing of chemical data, in particular work with molecular structures and calculations based on molecular properties. The ELN supports planning, description, storage, and management for the routine work of organic chemists. It also provides tools for communicating and sharing the recorded research data among colleagues. Meeting the requirements of a state-of-the-art research infrastructure, the ELN allows the search for molecules and reactions not only within the user’s data but also in conventional external sources as provided by SciFinder and PubChem. The presented development makes allowance for the growing dependency of scientific activity on the availability of digital information by providing open- source instruments to record and reuse research data. The current version of the ELN has been used for over half of a year in our chemistry research group, serving as a common infrastructure for chemistry research and enabling chemistry researchers to build their own databases of digital information as a prerequisite for the detailed, systematic investigation and evaluation of chemical reactions and mechanisms.

Keywords: Electronic lab notebook, digitization, open source, Ruby on Rails, compound management

Background

In the field of organic chemistry, like in any research area, the availability of digital data is a prerequisite for sustainable and successful research as it allows for the access of results, the search for information, and the processing of obtained research data.[1][2][3] Due to the ever-growing accumulation of information resulting from the constant saving and recording of data, it is imperative to improve data management with a digital system. Following the data life cycle, this enables the increase of knowledge by computing methods.[4][5][6] However, the lack of accessible and sufficiently mapped data limits current research, and the need to improve the situation has been stated many times before.[7][8][9] Therefore, the maintenance of systems for digital data acquisition, management, and storage is a key factor for efficient research activity.[10][11][12] The need for digitalization of data and its systematic storage presents challenges for the scientist, their institution providing the research infrastructure, and their scientific community. In the past, the discussion about the generation of and access to digital research information was mainly limited to published research data.[10][13][14] During the last two decades this accessibility has been improved drastically due to the availability of publications in online editions of scientific journals and the online-support of standard commercial databases like SciFinder[15] and Reaxys[16], as examples in chemical research. These developments have facilitated the search for published information, whereas solutions for a comprehensive digital storage and availability of all other research data, including data directly recorded in the laboratories, are still missing or lagging due to the challenging requirements of the research infrastructure in academia. The establishment of infrastructure in academic institutions is particularly difficult due to missing standards or policies in data handling and storage, diverse work practices, the prevalence of used equipment, and the limited budget for fundamental improvements. In natural sciences, the digitization of research data, as the basis for a later availability of the results and procedures, has to be implemented directly in the daily routine of scientists. Specific aspects of laboratory work have to be reflected in the electronic data acquisition and storage system depending on the research field. Although several electronic lab notebooks (ELNs) have been developed during the last few years, offering intelligent solutions for the documentation of research data (like sciNote[17], Biovia ELN[18], EMEN[19], openBIS ELN-LIMS[20], LabFolder[21], and others[22][23][24][25][26][27][28][29]), only a scant few electronic lab notebooks are dedicated to the chemical sciences.[30][31] In the chemical sciences in particular, challenges arise with the drawing and processing of chemical structures, a crucial and central step for the correlation of research data with the corresponding chemical transformation or structure.[32] Examples for systems in chemistry that offer the necessary support of chemical structures are the PerkinElmer E-Notebook for Chemistry[33], Indigo ELN[34], LabTrove[35][36][37], and open enventory.[38] These existing systems have already been in use by several groups and researchers. However, the sporadic implementation still reflects a mismatch between the offered solutions and the actual needs and resources of the chemists and their research facilities. This might be due to the high specific requirements for the software to reflect fast moving research: suitable ELNs have to be readily obtainable, adaptable, and modular without incurring additional costs. These features can probably only be offered by an open source project. In addition, a suitable, state-of-the-art system for sustainable research management should support communication with additional external databases and repositories, as well as connections to external devices and storage systems[39] of analytical results. Other important aspects are the embedding of calculation methods, and the possible extension of source code to the needs of other fields of chemistry (e.g., surface chemistry) and related domains of research (e.g., biology). As the identified criteria for a system to face the challenges of professional data management in academia could not be fulfilled by the currently available open-source systems, we initiated the development of a powerful ELN for chemical sciences. Such an ELN should offer the features, currently lacking in available systems, while being flexible, referring to the internal structure. Future extensions and adaptions to the needs of progressive chemistry research should be possible with minimal effort. The development of the Chemotion ELN resulted in a system with modern infrastructure that offers intelligent support of academic research projects as key instruments for the acquisition, storage, and management of digital data in chemistry.

Implementation

The Chemotion ELN was programmed in Ruby, Javascript, HTML, and CSS. The back-end server is built on the Ruby on Rails framework using the PostgreSQL relational database, while the front-end user interface is mainly constructed with the ReactJS framework to serve a single page application (Fig. 1). Ruby on Rails adopts Ruby, a script language, which enables fast development with a clear model-view-controller (MVC) structure. On the other hand, ReactJS separates document object model (DOM) manipulations from data flow and decomposes entangled structures for sophisticated user interactions. People who want to expand the features of the Chemotion ELN or start a new related project can comprehend the logic with a less steep learning curve. Ruby package management allows users to easily implement an external package from a public code repository. The ELN was programmed in such a way as to be customizable through this practical package management. Plugins specific to the ELN can also be written as Rails::Engine so as to extend the ELN DB, server-side functions, and user interface. Adding additional web pages, or even modifying the main application page produced with ReactJS modules, is possible.


Fig1 Tremouilhac JOfChemoinfo2017 9.gif

Figure 1. ELN architecture diagram: Summary of programming languages, front-end input/output, and external service connection

Results

The Chemotion ELN offers an extended management system for projects, allowing the formation of a clear structure for research data. The organization of projects is implemented by the sorting of individual elements according to collections. Collections can be generated, edited, and deleted via a separated organizer which enables the establishment of a user-defined ELN structure. Changes within the collections can be easily performed via drag-and-drop of selected elements, allowing for a fast hierarchical organization of collections of elements. This organization can be modified at any time, reflecting possible changes of research projects in a flexible manner (Fig. 2). While the user management interface facilitates the work with information of the ELN user, it also contains management functionalities for the organization of information that has been gained from or provided to other researchers.


Fig2 Tremouilhac JOfChemoinfo2017 9.gif

Figure 2. Management of collections as project planning and organization tool of the Chemotion-ELN - Left: management of projects and visibility of connections; Right: view within the ELN navigation bar

The core functions of Chemotion ELN

The ELN offers the necessary features for the documentation of chemical projects, including the processing of molecules and reactions. The elements of the ELN are organized in separate lists, e.g., for molecules or reactions, assigned to collections. This allows a clear and arranged structure at a low information level (Fig. 3). The list view is complemented by a summary of the available information on single items such as the availability of data in external databases, the assignment to particular collections, and the status of stored attachments. Additionally, the list view supports swift navigation to activities that are assigned to list items. Another panel with a detailed level of information is visible upon selecting an element. This panel permits the user to visualize information and edit them. Textual descriptions, additional values, supplemental analytical data, links to external sources, and references are encompassed in several tabbed panels. The element lists and detailed views of the selected elements are built with the functionality of a modern web-based application, facilitating the fast organization of research data through diverse actions such as drag and drop, automated sorting of elements, and notifications. The available information and the occurrence of the elements in other projects of the ELN are provided as a link.


Fig3 Tremouilhac JOfChemoinfo2017 9.gif

Figure 3. Organization of elements (molecules and reactions) in lists - Left: a selected list for molecules and samples with annotations for additional information; Right: a list of reactions with information on reagents, yield and additional notes

Elements of the ELN

The submission of elements such as molecules and reactions is based on the use of an advanced embedded molecule editor derived from Ketcher, an open-source web application.[40] The internal structure of the ELN follows strict rules for the creation of new elements, which results in a differentiated database model having distinct tables for molecules and samples (see Fig. 4 and database relations in Additional file 1). According to this concept, the generation of the molecular structure for a chemical compound requires at least the registration of a molecule. The structure editor is the essential part for the definition of molecules within the ELN, as it generates the connection table. With this information, the International Chemical Identifier (InChI) and InChIKey, a hashed version of the InChI, are generated by OpenBabel. With the database molecule table indexed over the InChIKey values, a new molecule entry is created if the unique identifier is not found. In that case, generic information is generated by OpenBabel and complemented by querying the PubChem database. This information comprises the molecule's IUPAC name, the exact mass, and the molecular mass, as well as the SMILES code and chemical abstracts service (CAS) registry number. The molecular structure of the molecule, in combination with the assigned information, then serves as a substantial part for the creation of samples, which are the physical equivalent to the designed molecules. Only samples can be assigned to research actions and reaction plans. The DB structure of the sample allows adding more information to a given theoretical molecular structure and includes the properties that depend on a specific experimental case, such as purity. The registration and consequent use of either molecules or samples while working with the Chemotion ELN is the basis for a well-organized, and, in the end, reproducible synthetic documentation. The association of samples to molecules allows the accumulation of information while offering flexibility in the definition of single samples and their visualization. As an example, MDL molfiles are stored both for the sample and its associated generic molecule, giving the opportunity to individually style samples created from the same molecule. A nearly similar procedure is established for the assignment of CAS registry numbers, of which all available ones are stored with the molecule allowing the user to select and store one of them with a particular sample (a detailed description of the process is given in the Additional file 1). While such a clear differentiation between molecules and samples is not reflected in most of the other chemistry ELNs, this is a central point in the development of the Chemotion ELN.


Fig4 Tremouilhac JOfChemoinfo2017 9.gif

Figure 4. Differentiation between a molecule and corresponding samples

The definition of unique experimental samples in contrast to generic molecules is a prerequisite for a systematic documentation and follow-up of particular batches in the synthetic work process. Complemented with a naming of the individual sample that reflects the sample’s ancestry (the labels of descendant samples include the label of the original sample and a systematic batch number), the research workflow in the laboratory can be recorded with the highest accuracy.

The representation of a physically used substance or its preparation in the ELN includes the summary of the available data from the related molecule, allowing for fast availability of all information that is necessary for rapid management of research projects. The automatically provided data, as well as the input given by the user, are organized into three main tabbed panels which consist of:

1. information for a detailed definition of the properties (Fig. 5, left)


Fig5 Tremouilhac JOfChemoinfo2017 9.gif

Figure 5. Left: detailed view of a properties tab including information of molecule and sample properties; Right: view of the analysis tab of the given sample

2. additional data that can be attached to the uploaded files with research data (Fig. 5, right)

3. results that have been gained with the sample through an external process

Other panels can be added through the ELN customization with plugins that provided the user extended functions:

4. request to SciFinder and a direct connection to the search results, and

5. predicted NMR information via the web service NMRdb[41]

The embedding of SciFinder functions (tab 4) requires the configuration of an ELN plugin which is also available on a public repository. However, the institution-dependent credentials for the SciFinder service need to be configured on the server. User access to SciFinder can be initialized via a change of the ELN settings, where the CAS-provided credentials have to be entered once (Fig. 6). This step automatically generates an access token with 10-day validity.


Fig6 Tremouilhac JOfChemoinfo2017 9.gif

Figure 6. Left: Changing the user settings with the SciFinder credentials to obtain a user token with time-limitation; Right: SciFinder tab with results of a database request with four hits identified for the exact structure search

The plugin implements query functions to the CAS SciFinder database according to three different search modes reflecting the SciFinder internal search modes “exact,” “substructure,” and “similarity” search. The hit count of the search results is retrieved with a link to the answer set directing to the SciFinder web application. The history of the latest requests and answers of the current user is also listed. As soon as a molecule search in SciFinder is processed, the results are also given in the list of molecules, indicating the search date, whether the structure is registered in SciFinder or not, and the number of results. The direct visibility of published structures via the ELN allows quick access to information which was, up to now, only able to be retrieved via the SciFinder page directly. To give a comprehensive overview of the novelty of a researcher’s work and the availability of research data, we additionally implemented an automated procedure to assess the presence of any molecule from the ELN in the PubChem database (NCBI). As given for the embedded SciFinder feature, the matching molecules are accessible via a direct link to the PubChem Index of the identified item. The information on the presence of the requested structure in the NCBI database is summarized in the molecule and sample lists (Fig. 7). While the SciFinder search allows a differentiation of the search request according to the user’s preference, the implemented PubChem requests are only executed with the exact structure. While being less flexible according to customized search strategies, this limitation allows the automated processing of the requests instantly with the creation of a new molecular structure.


Fig7 Tremouilhac JOfChemoinfo2017 9.gif

Figure 7. Request for presence and accessibility of information to specific molecules via PubChem and embedding of the answer sets in Chemotion ELN

Besides molecules and samples, reactions belong to the main elements that can be generated and managed with the ELN. A reaction is created easily by the addition of information to a reaction template (Fig. 8). The user can assign samples and molecules to the reaction in their distinct function as starting material, reagent, or product. The basic scheme for samples in reactions allows for the addition of the amount of the substances in g (alternatively in mg or µg), in ml (or µl) or the definition of the used compound in mol (mmol) equivalents. The implemented dependencies between the given information and the molecular weights allow the calculation of all necessary values as long as the basic information is given. The structure of the reaction user interface is very flexible, enabling the exchange of elements at any time per drag-and-drop. Samples that have been assigned to a role as starting material can be changed into reagents during the planning of the reaction. The assignment of samples to particular roles within a reaction act upon the calculations, as the equivalents are always calculated with respect to the given amount of starting material, which is set to "1" per default. When several starting materials are entered, either one of them, or a reactant, has to be set by the user as the reference material with 1.0 equivalent.

A unique feature of the Chemotion ELN is the record of real values in parallel to the data of the originally planned experiments. This allows the accurate documentation of the real experiment while having the possibility to use the planned procedure as a template that can serve as a copy for a repeat in a standard way. The change from target to real values is implemented via a switch from value T to R for each sample. The chemicals that are assigned to the reaction are accessible via a direct link to the detailed level of the sample list. All data and changes that are submitted to the samples (like the density of a chemical) are considered instantly for the calculation of the reaction. The ELN is designed on the one hand to offer as much flexibility as possible but on the other hand to limit user actions that could compromise the integrity of the experimental data. While all parameters of a reaction can be inputted and submitted either via the predefined or free text fields within the information panels like under the Scheme tab, there are other fields where calculated data are only visible but not editable. An example for the latter limitation is the yield field displayed for reactions. The ability to input a value for the yield of a reaction is disabled in all cases, as the yield should be the result of the gained amount of the product of a reaction. Another feature for the planning of reproducible reactions has been added with a solvent manager. This tool allows the addition of several solvents (via drag-and-drop from the sample list, via drawing and generating a solvent from scratch, or via a selection from a drop-down menu) and volumes, for which the concentration of reagents is estimated automatically and given in the reaction table (Fig. 8).


Fig8 Tremouilhac JOfChemoinfo2017 9.gif

Figure 8. Planning and editing reactions via the Chemotion ELN (left panel); direct connectivity to the sample related data (right panel)

The Chemotion ELN can be used for detailed tracking of samples and reactions thanks to a systematic and automatic identification of all items, including intuitive labeling of the given workflow. Samples that are part of any process within the ELN bear information about their origin and use in their name and short_label descriptors. Samples that have been newly created or that have been generated via the copy of a molecular structure have a simple name consisting of the initials of the ELN user and a sequential number. Samples that are created from those samples are regarded as child samples which are visible through the attachment of a child batch number “− 1..− x” to the original label. Samples that appear as a target compound in a reaction gain in addition a reaction label, which allows the direct assignment of this sample to the reaction and its number. Therefore, the systematic reaction name appears in every product, side product, and fraction of the experiment, allowing for the fast identification of analytical results being labeled in the same manner. All samples that are assigned to the type starting material or product are visible via the sample and molecule list, while samples that are assigned to the function reagent are not listed. This allows for a brief representation of the most important information by avoiding overcrowding the interface with repeatedly used standard reagents (e.g., inorganic salts, bases) and by keeping a consistent record of all reagents used. The reaction scheme and the reaction table can be completed by additional information such as name [free text], status [planned, successful, unsuccessful], temperature or time–temperature table [number/adaptable to °C, °F, K], and description (free text). The addition of a description is supported by several predefined and formatted procedures which might be used for a fast report on a chemical procedure in a standardized manner. Three other tabbed panels have been implemented for the submission of further information to a reaction; under tab properties, the start and end time points of a reaction and the detailed definition of the TLC control can be given. Literature citations can be added to the reaction by typing a title and the corresponding URL in the references tab, which allows the addition of as many references as desired. The last tab, analysis, displays the analytical experiments associated to each of the obtained product samples of the reaction. This allows a clear and straightforward organization of the obtained analytical results even if several isolated compounds have been obtained. The user benefits from several direct export functionalities working with reactions in the detail level. The information that is distributed over the described four tabbed panels can be summarized either in one word document in a very practical manner or the samples that are used in the reaction can be exported to Excel with one mouse click.

Export and import

Exchanging data between different or isolated systems is a critical issue while managing data. For this reason, the support for two simple and widely used file formats has been implemented and allows transferring data for a selection of samples in and out of the ELN as Excel (.xlsx) or sd files (.sdf). The detail level of data to export can be determined by the user via a check box menu (Fig. 9).


Fig9 Tremouilhac JOfChemoinfo2017 9.gif

Figure 9. Export scheme allowing the selection of single items to be exported

Sharing of information

The Chemotion ELN was equipped with two functions for sharing information with other ELN users. These tools complete the functionality of exporting and importing information, allowing for detailed visibility of the obtained research data directly through the ELN. Both operation models, called sharing and synchronization, are accessible through a user interface that allows the organization of single colleagues or groups according to their status and desired access policies (Fig. 10, right). The ELN user and owner of the submitted data sets the level of permission for the recipient, or group, either by choosing a standard role or by selecting more detailed information levels. The permission levels for allowed actions range from a simple read policy to a take-ownership policy. The detailed level of what data can be accessed for the samples and reactions can also be limited to a few fields. User groups are easily defined to facilitate the sharing of research activity with a larger community (Fig. 10, left).


Fig10 Tremouilhac JOfChemoinfo2017 9.gif

Figure 10. Left: creation of new user groups; Right: definition of user groups and assignment of sharing role, permission level and available detail level for single users and user groups

Though the selection of the user role and rights are the same for the sharing and synchronizing tool, the two options are different concerning the currentness of the provided research data. Through the "sharing" of a collection, a fixed set of samples and reactions is made accessible to others with, if desired, the ability for the recipients to edit the contained elements. The actions read, write, share, delete, import elements, or take ownership, depending on the access policy, can be used, but new elements cannot be added. This is however feasible when using "synchronized" collections. Synchronized collections are created to allow permanent access of other ELN users to the chosen set of research data including the visibility (and modification) of changes that have been made after the synchronization.

Search functions

One of the main arguments for the management of research data with an ELN is the digital availability of information. The digital availability offers the possibility to search for data and information if the organization and maintenance of the ELN supports that in a suitable way. The Chemotion ELN allows text and structure search within the diverse contents of the ELN. The search of either text fragments or chemical structures can be further limited to distinct elements (samples, reactions) to facilitate the evaluation of the results. The text-based search uses the PostgreSQL Trigram module for alphanumeric trigram matching to seek the presence of text or formula fragments in samples. Most of the non-numeric properties of the samples such as name, molecular formula, IUPAC name, inchistring, and canonical smiles are searched. The associated content in reactions will be filtered based on the search result. The search for structures can be performed either by the search for a substructure or a similarity search of which both methods are fingerprint-based methods. We implemented a path-based fingerprint method, referred to as FP2 in OpenBabel. This fingerprint is identical with Daylight fingerprints, which are used as a standard for benchmarking in many publications and is also used to calculate molecule similarity using the Tanimoto coefficient.[42] The minimum similarity threshold can be defined through the ELN interfaces (Fig. 11).


Fig11 Tremouilhac JOfChemoinfo2017 9.gif

Figure 11. Search functions with structure and substructure search (adaptable through similarity search)

Codes and tracking

The management options of the Chemotion ELN are complemented by barcode and QR code tracking of single elements and items. This feature, often offered with laboratory information management systems (LIMS), is implemented for reactions, samples, and analyses. Parallel to the creation of each of the latter items, a Universally Unique Identifier (UUID) version 4 is registered. The ELN provides a QR code or a truncated barcode representation of the associated identifier, allowing for flexible labeling. Analyses associated with samples are also assigned to a UUID. Procedures to generate PDF files of the codes for faster printing in different sizes have been implemented, and they render the QR code, the barcode, and the assigned Sample ID (Fig. 12). Using a webcam or a specific code reading device, the user can scan the code and navigate directly to the associated element in the ELN.


Fig12 Tremouilhac JOfChemoinfo2017 9.gif

Figure 12. Barcode and QR code generation, printing and tracking of samples via code reading

Evaluation of the ELN and user feedback

The development of the Chemotion ELN is the result of a long lasting process within our work group aiming for the installation of software that fulfills the requirements of a modern, fast and flexible infrastructure. The ELN is used in our group by master students, PhD students, and technicians. The continuous integration and deployment provide the users the latest developments, changes, and corrections on a frequent basis (at least once a week). In this manner, the ELN is constantly checked and evaluated, allowing for the rapid identification of errors and missing features. New feature requests or suggestions are entered by selected users via an internal GitLab CE portal and are prioritized according to urgency and users’ upvoting. The user’s feedback reveals roughly two groups: users who have tested or used other ELNs before and those who are using an ELN for the first time.

For the first user group, the feedback is consistently positive, and the training time for such an experienced user is short. This group has remarked positively on the fast and convenient way to search items (samples/reactions) and the clear overview of all data that can be adapted to the user’s preferences. Members of this group extensively use features for storing NMR spectra (along with the associated experiments) and for sharing results, reactions, and whole collections of entries with colleagues. When asked about the main differences compared to other systems, they emphasize a better and more sustainable accessibility to their data because use of the ELN is not limited to the availability of specific additional software and can be accessed independent of the platform. In regard to formerly used ELNs, the risk of not being able to access the data any more as a result of software or hardware problems was discussed frequently; however, the Chemotion ELN was very successful in providing confidence in the accessibility of digital data. Particularly, the latter argument is interesting because it stands in contrast to the opinion of the user group without ELN experience. These users fear — which is one of their strongest arguments against a use of the ELN — that the system could be compromised from outside and that research data could be stolen or deleted.

The non-experienced ELN users need more time to become familiar with digital reporting in general, as they, for example, need to understand the logic of the differentiation between molecules and samples and its use within the ELN. For these users, teaching or mentoring by more experienced users is vital to becoming familiar with all functions. We tried to advocate the use and functions of the software to the students with a manual that includes illustrative examples and screenshots of all features. It turned out that such a written manual has little impact to raise the user interest. Functionalities that are valued by all users are for example the SciFinder function and moreover the PubChem link, as well as the retrieval of CAS registry numbers. Those functions allow rapid retrieval of additional information on compounds or possible reactions or properties and are therefore highly requested. The individual use of the provided ELN depends strongly on the preferences of the users and on the equipment of the laboratory in general. The majority of users appreciate the availability of their data wherever they are. However, this depends of course on internet access, and a VPN connection. It allows them to be more flexible in their time management because reviewing of data and collecting of information and additional documentation from different workplaces can be done at any time. As all PhDs, masters students, and technicians spend most of their working time in the laboratory, the main application of the ELN takes place directly in the chemistry lab, and all users enter the ELN either via a personal notebook or desktop PC that is provided. None of the current researchers uses the ELN via tablet or a smartphone (although there are no technical limitations). This is due to the fact that an important advantage of the ELN is the direct and connected visibility of datasets and information. This visibility is lost in part with smaller screens. The ELN users are often asked about the need to further write paper-based notes and descriptions. At this stage, the ELN does not include connectivity to devices; therefore, everyone still needs to create hand-written documentation to some extent, at least to record information from external instruments like balances.

Conclusion

We present the development of an open-source electronic lab notebook (ELN) for researchers who work in the field of chemical sciences, making allowance for the growing dependency of scientific activity on the availability of digital information. The web-based application, which has already been implemented in daily laboratory work, allows the acquisition, management, storage, processing, and sharing of chemistry research data. The ELN, as an example for a modern and powerful research infrastructure, provides tools for communicating and sharing the recorded data. It facilitates research via offering access to various functions, helper tools, and external sources. In addition, it allows for one of the most important improvements regarding scientific workflow: it will enable chemistry researchers in academia to build their own databases of digital information, which is a prerequisite for the detailed, systematic investigation and evaluation of chemical reactions and mechanisms. However, many features that are necessary to meet all needs for chemistry research are not implemented yet and will be part of further developments. Examples of those work-in-progress features are (a) a document generation function that creates and archives projects as either a report or the supporting information for a publication, (b) the implementation of queries to additional chemistry databases like ChemSpider, and (c) the development of an API to a chemistry repository that will allow the direct transfer of research data to an online portal with global access. Development is still ongoing, and novel ideas for additional features are discussed daily with the programmers for future implementations. On a broader scope, additional functionality has been requested by researchers working in the field of biology yet cooperating closely with chemists. In the future, the ELN should be usable as a platform that allows for the sharing of information on molecules for research on a common project. Although there is a need for adaptions and extensions of the current software version to meet those requirements, preliminary results already show significant applicability of the ELN in an interdisciplinary work environment.

Abbreviations

ELN: electronic laboratory notebook

MVC: model-view-controller

DOM: document object model

CSS: cascading style sheets

DB: database

NCBI: National Center for Biotechnology Information (US)

UUID: Universally Unique Identifier

VPN: virtual private network

Declarations

Authors’ contributions

PT and FH developed the main structure of the herein described software, PT implemented and developed the SciFinder plugin and managed the detailed conception. DSL was involved in preliminary discussions and worked on the reactions table. An Nguyen implemented requests to NMRdb and search functions, YCH worked on the information input and visualization for reactions. SK implemented the Ketcher Editor and adapted the ELN concerning necessary changes. NJ and SB are corresponding authors of this publication, they planned the overall structure and requirements of the ELN and did the conception. All authors read and approved the final manuscript.

Acknowledgements

We acknowledge support by Deutsche Forschungsgemeinschaft and the Open Access Publishing Fund of Karlsruhe Institute of Technology. This work was supported by the Helmholtz program Biointerfaces in Technology and Medicine (BIFTM). We are very thankful to the members of the Stefan Bräse group who contributed with manifold suggestions to a permanent improvement of the ELN and to the companies NinjaConcept and Cubuslab and their members Julian Lübke and Marco Sehrer who contributed to the project with ideas. We are grateful for the allowance to request information from NMRdb and SciFinder and want to thank Luc Patiny and Karin Färber.

Competing interests

Florian Hübsch works at ninjaconcept, the company that developed parts of the herein described open-source software.

Availability of data and materials

The Supporting Information covers technical aspects and details of the software and programming, the installation requirements, and the details of the Docker file that includes the installation environment. A simplified database entity_relationship model is depicted, and features of the ELN are summarized in a table. The Supporting Information also provides comprehensive information about the web user interface and a detailed explanation of all images.

Availability and requirements

Project name: Chemotion_ELN

Project home page: https://github.com/ComPlat/chemotion_ELN

Operating system(s): platform-independent access, developed/tested on Linux and Mac, deployed on Linux

Other requirements: Modern internet browser supporting HTML5 and JavaScript; recommended browsers: Chrome, Firefox (IE not supported)

Programming language: Javascript, Ruby , CSS (> 3%), HTML

License: GNU AGPL v3.0 (Affero General Public License version 3)

Funding

This project has been funded by the German Research Foundation (Deutsche Forschungsgemeinschaft).

Additional files

Additional file 1: Supporting information including (1) technical aspects and details covering the software development and the Docker file, (2) database entity relationship diagram, (3) summary of features according to the main modules, and (4) procedures in pictures

References

  1. Winkler-Nees, S. (2013). "Status of Discussion and Current Activities: National Developments". In Neuroth, H.; Strathmann, S.; Oßwald, A.; Ludwig, J.. Digital Curation of Research Data: Experiences of a Baseline Study in Germany. Universitätsverlag Göttingen. pp. 18–36. ISBN 9783864880544. https://books.google.com/books?id=Nf35AgAAQBAJ&pg=PA18&lpg=PA18. 
  2. Stajich, J.E.; Lapp, H. (2006). "Open source tools and toolkits for bioinformatics: Significance, and where are we?". Briefings in Bioinformatics 7 (3): 287–96. doi:10.1093/bib/bbl026. PMID 16899494. 
  3. Owens, B. (2016). "Data sharing: Access all areas". Nature 533 (7602): S71-2. doi:10.1038/533S71a. PMID 27167398. 
  4. Pirhadi, S.; Sunseri, J.; Koes, D.R. (2016). "Open source molecular modeling". Journal of Molecular Graphics & Modeling 69: 127–43. doi:10.1016/j.jmgm.2016.07.008. PMC PMC5037051. PMID 27631126. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037051. 
  5. Segler, M.H.S.; Waller, M.P. (2017). "Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction". Chemistry 23 (25): 5966-5971. doi:10.1002/chem.201605499. PMID 28134452. 
  6. Christ, C.D.; Zentgraf, M.; Kriegl, J.M. (2012). "Mining electronic laboratory notebooks: analysis, retrosynthesis, and reaction based enumeration". Journal of Chemical Information and Modeling 52 (7): 1745-56. doi:10.1021/ci300116p. PMID 22657734. 
  7. NA (2009). "Data's shameful neglect". Nature 461 (7261): 145. doi:10.1038/461145a. PMID 19741659. 
  8. Bird, C.L.; Frey, J.G. (2013). "Chemical information matters: An e-Research perspective on information and data sharing in the chemical sciences". Chemical Society Reviews 42 (16): 6754-76. doi:10.1039/c3cs60050e. PMID 23686012. 
  9. Alsheikh-Ali, A.A.; Qureshi, W.; Al-Mallah, M.H.; Ioannidis, J.P. (2011). "Public availability of published research data in high-impact journals". PLoS One 6 (9): e24357. doi:10.1371/journal.pone.0024357. PMC PMC3168487. PMID 21915316. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168487. 
  10. 10.0 10.1 Szymkuć, S.; Gajewska, E.P.; Klucznik, T. et al. (2016). "Computer-Assisted Synthetic Planning: The End of the Beginning". Angewandte Chemie 55 (20): 5904-37. doi:10.1002/anie.201506101. PMID 27062365. 
  11. Borgman, C.L. (2012). "The conundrum of sharing research data". Journal of the Association for Information Science and Technology 63 (6): 1059–1078. doi:10.1002/asi.22634. 
  12. Ghosh, S.; Matsuoka, Y.; Asai, Y. et al. (2011). "Software for systems biology: From tools to integrated platforms". Nature Reviews Genetics 12 (12): 821-32. doi:10.1038/nrg3096 pmid=22048662. 
  13. Butler, D. (2017). "Gates Foundation announces open-access publishing venture". Nature 543 (7647): 599. doi:10.1038/nature.2017.21700. PMID 28358109. 
  14. Lawrence, K. (2017). "Open Access is Evolving and ChemistryOpen is Too!". ChemistryOpen 6 (1): 3–4. doi:10.1002/open.201600165. PMC PMC5288765. PMID 28168141. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5288765. 
  15. "SciFinder". Chemical Abstracts Service. http://www.cas.org/products/scifinder. Retrieved 2017. 
  16. "Reaxys". Elsevier. https://www.elsevier.com/solutions/reaxys. Retrieved 2017. 
  17. "biosistemika/scinote-web". GitHub, Inc. https://github.com/biosistemika/scinote-web. Retrieved 2017. 
  18. "BIOVIA Electronic Laboratory Notebooks". Dassault Systemes. http://accelrys.com/products/unified-lab-management/biovia-electronic-lab-notebooks/. Retrieved 2017. 
  19. Rees, I.; Langley, E.; Chiu, W.; Ludtke, S.J. (2013). "EMEN2: An object oriented database and electronic lab notebook". Microscopy and Microanalysis 19 (1): 1–10. doi:10.1017/S1431927612014043. PMC PMC3907281. PMID 23360752. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907281. 
  20. Barillari, C.; Ottoz, D.S.M.; Fuentes-Serna, J.M. et al. (2016). "openBIS ELN-LIMS: An open-source database for academic laboratories". Bioinformatics 32 (4): 638–640. doi:10.1093/bioinformatics/btv606. PMC PMC4743625. PMID 26508761. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4743625. 
  21. "LabFolder". LabFolder GmbH. https://www.labfolder.com/. Retrieved 2017. 
  22. Zeng, J.; Hillman, M.; Arnold, M. (2011). "Impact of the implementation of a well-designed electronic laboratory notebook on bioanalytical laboratory function". Bioanalysis 3 (13): 1501–11. doi:10.4155/bio.11.116. PMID 21728774. 
  23. Beato, B.; Pisek, A.; White, J. et al. (2011). "Going paperless: Implementing an electronic laboratory notebook in a bioanalytical laboratory". Bioanalysis 3 (13): 1457–70. doi:10.4155/bio.11.117. PMID 21702721. 
  24. Rubacha, M.; Rattan, A.K.; Hosselet, S.C. (2011). "A review of electronic laboratory notebooks available in the market today". Journal of Laboratory Automation 16 (1): 90–98. doi:10.1016/j.jala.2009.01.002. PMID 21609689. 
  25. Taylor, K.T. (2006). "The status of electronic laboratory notebooks for chemistry and biology". Current Opinion in Drug Discovery & Development 9 (3): 348–53. PMID 16729731. 
  26. van Eikeren, P. (2004). "Intelligent Electronic Laboratory Notebooks for Accelerated Organic Process R&D". Organic Process Research & Development 8 (6): 1015–23. doi:10.1021/op049890j. 
  27. Achor, Z.; Ladboeur, T.; Gien, O. et al. (2004). "Sanofi-Synthelabo Chemical Development and the Development of an Electronic Laboratory Notebook". Organic Process Research & Development 8 (6): 983–97. doi:10.1021/op040012v. 
  28. Walsh, E.; Choi, I. (2013). "Using Evernote as an electronic lab notebook in a translational science laboratory". Journal of Laboratory Automation 18 (3): 229-34. doi:10.1177/2211068212471834. PMID 23271786. 
  29. Goddard, N.H.; Macneil, R.; Ritchie, J. (2009). "eCAT: Online electronic lab notebook for scientific research". Automated Experimentation 1: 4. doi:10.1186/1759-4499-1-4. 
  30. Bird, C.L.; Willoughby, C.; Frey, J.G. (2013). "Laboratory notebooks in the digital era: The role of ELNs in record keeping for chemistry and other sciences". Chemical Society Reviews 42 (20): 8157–75. doi:10.1039/c3cs60122f. PMID 23864106. 
  31. Voegele, C.; Bouchereau, B.; Robinot, N. et al. (2013). "A universal open-source electronic laboratory notebook". Bioinformatics 29 (13): 1710-2. doi:10.1093/bioinformatics/btt253. PMID 23645817. 
  32. Coles, S.J.; Frey, J.G.; Bird, C.L. et al. (2013). "First steps towards semantic descriptions of electronic laboratory notebook records". Journal of Cheminformatics 5 (1): 52. doi:10.1186/1758-2946-5-52. PMC PMC3878183. PMID 24360292. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3878183. 
  33. "E-Notebook for Chemistry". PerkinElmer, Inc. https://www.cambridgesoft.com/Ensemble_for_Chemistry/ENotebookforChemistry/. Retrieved 2017. 
  34. "epam/Indigo". GitHub, Inc. https://github.com/epam/Indigo. Retrieved 2017. 
  35. Day, A.E.; Coles, S.J.; Bird, C.L. et al. (2015). "ChemTrove: Enabling a generic ELN to support chemistry through the use of transferable plug-ins and online data sources". Journal of Chemical Information and Modeling 55 (3): 501–9. doi:10.1021/ci5005948. PMID 25679543. 
  36. Frey, J.G.; Coles, S.J.; Bird, C.L. et al. (2014). "Sample management with the LabTrove ELN". Conference Proceedings from the 247th ACS National Meeting & Exposition. https://eprints.soton.ac.uk/368133/. 
  37. Willoughby, C.; Bird, C.L.; Coles, S.J.; Frey, J.G. (2014). "Creating context for the experiment record. User-defined metadata: investigations into metadata usage in the LabTrove ELN". Journal of Chemical Information and Modeling 54 (12): 3268–83. doi:10.1021/ci500469f. PMID 25405258. 
  38. Rudolphi, F.; Goossen, L.J. (2012). "Electronic laboratory notebook: The academic point of view". Journal of Chemical Information and Modeling 52 (2): 293–301. doi:10.1021/ci2003895. PMID 22077095. 
  39. Lütjohann, D.S.; Jung, N.; Bräse, S. (2015). "Open source life science automation: Design of experiments and data acquisition via “dial-a-device”". Chemometrics and Intelligent Laboratory Systems 144: 100–107. doi:10.1016/j.chemolab.2015.04.002. 
  40. "epam/ketcher". GitHub, Inc. https://github.com/epam/ketcher. Retrieved 2017. 
  41. Banfi, D.; Luc, P. (2008). "Resurrecting and Processing NMR Spectra On-line". CHIMIA 62 (4): 280–281(2). doi:10.2533/chimia.2008.280. 
  42. Bajusz, D.; Rácz, A.; Héberger, K. (2015). "Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?". Journal of Cheminformatics 7: 20. doi:10.1186/s13321-015-0069-3. PMC PMC4456712. PMID 26052348. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456712. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In some cases, the authors directly referenced a citation number; the author and year of the citation was inserted along with the citation for completeness.