Journal:MASTR-MS: A web-based collaborative laboratory information management system (LIMS) for metabolomics

From LIMSWiki
Revision as of 21:19, 17 April 2017 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title MASTR-MS: A web-based collaborative laboratory information management system (LIMS) for metabolomics
Journal Metabolomics
Author(s) Hunter, A.; Dayalan, S.; De Souza, D.; Power, B.; Lorrimar, R.; Szabo, T.; Nguyen, T.; O'Callaghan, S.; Hack, J.; Pyke, J.; Nahid, A.; Barrero, R.; Roessner, U.; Likic, V.; Tull, D.; Bacic, A.; McConville, M.; Bellgard, M.
Author affiliation(s) Murdoch University, The University of Melbourne, The Australian Wine Research Institute
Primary contact Email: malcolmm at unimelb dot edu dot au -or- mbellgard at ccg dot murdoch dot edu dot au
Year published 2017
Volume and issue 13 (2)
Page(s) 14
DOI 10.1007/s11306-016-1142-2
ISSN 1573-3890
Distribution license Creative Commons Attribution 4.0 International
Website https://link.springer.com/article/10.1007/s11306-016-1142-2
Download https://link.springer.com/content/pdf/10.1007%2Fs11306-016-1142-2.pdf (PDF)

Abstract

Background

An increasing number of research laboratories and core analytical facilities around the world are developing high throughput metabolomic analytical and data processing pipelines that are capable of handling hundreds to thousands of individual samples per year, often over multiple projects, collaborations and sample types. At present, there are no laboratory information management systems (LIMS) that are specifically tailored for metabolomics laboratories that are capable of tracking samples and associated metadata from the beginning to the end of an experiment, including data processing and archiving, and which are also suitable for use in large institutional core facilities or multi-laboratory consortia as well as single laboratory environments.

Results

Here we present MASTR-MS, a downloadable and installable LIMS solution that can be deployed either within a single laboratory or used to link workflows across a multisite network. It comprises the Node Management System that can be used to link and manage projects across one or multiple collaborating laboratories; the User Management System which defines different user groups and privileges of users; the Quote Management System where client quotes are managed; the Project Management System in which metadata is stored and all aspects of project management, including experimental setup, sample tracking and instrument analysis, are defined; and the Data Management System that allows the automatic capture and storage of raw and processed data from the analytical instruments to the LIMS.

Conclusion

MASTR-MS is a comprehensive LIMS solution specifically designed for metabolomics. It captures the entire lifecycle of a sample, starting from project and experiment design to sample analysis, data capture and storage. It acts as an electronic notebook, facilitating project management within a single laboratory or a multi-node collaborative environment. This software is being developed in close consultation with members of the metabolomics research community. It is freely available under the GNU GPL v3 license and can be accessed from https://muccg.github.io/mastr-ms/.

Keywords

MASTR-MS, metabolomics, LIMS, omics

Introduction

Metabolomic approaches aim to detect and quantitate levels of all small molecules in a biological system and, together with other "omic" approaches, can be used to generate a systems-wide understanding of biological processes. Metabolomic approaches typically involve the use of advanced mass spectrometry and nuclear magnetic resonance (NMR) spectrometry platforms to maximize coverage of the chemically diverse metabolites that make up biological systems. In many cases, these analytical platforms are located in institutional and/or national core facilities that offer a range of metabolomics capabilities to researchers.[1][2][3][4][5] These core facilities, as well as individual research groups with sophisticated metabolomics infrastructure and capability are faced with the challenge of tracking large numbers of samples and the associated metadata, and linking this information with the raw datasets generated by multiple analytical platforms, as well as processed down-stream data sets. Data handling extends beyond collection and curation of raw data, to the management of metadata that defines how the raw data is generated. Major funding agencies, such as Europe’s Horizon 2020[5], the NIH[6], The Wellcome Trust[7] and Australia’s NHMRC[8] have established data management plans that researchers are expected to follow in order to capture, store and share data generated by their grants. Scientific journals are also increasingly requesting that experimental data and metadata associated with metabolomics experiments are made available to the scientific community[9][10], leading to the establishment of data repositories, such as MetaboLights[11] and Metabolomics Workbench.[12]

LIMS are software solutions that aim to manage the entire workflow of a laboratory. A number of LIMS have been developed or adapted from other applications for curating metabolomics experiments and data management (e.g., SetupX, Sesame). While these LIMS have features that allow capture of project metadata, experiments and samples, data storage, and data sharing, they exhibit a number of limitations around their capacity to accommodate different vendor instruments and have restricted functionalities to facilitate a collaborative configuration between geographically distributed laboratories. In this paper we present MASTR-MS, the first wholly functional, open-source LIMS solution specifically designed for metabolomics laboratories.

Materials and methods

MASTR-MS runs as a Python[13] web application built on the Django[14] framework, utilising a PostgreSQL[15] or MySQL[16] relational database. MASTR-MS leverages the functionality of the Django framework for user management, users permissions and security. Django is a mature web framework and provides multiple security tools and mechanisms. For example, specific protection is provided against cross-site scripting (XSS), cross-site request forgery (CSRF), SQL injection and clickjacking. A security middleware is also used to enforce SSL/HTTPS for all traffic. MASTR-MS is built using open-source components and communicates using open standards. The client side browser interface leverages Javascript and AJAX for fluid data display and submission, giving a user experience much like a desktop application, but with the flexibility of being available from any internet-connected location on any operating system, with no client-side download or installation.

The DataSync Client is a small desktop application that runs on an instrument’s acquisition computer. This software constantly communicates with the MASTR-MS server and is responsible for transferring raw data from the acquisition computer to the MASTR-MS repository (Supplemental Fig. S9A). The DataSync Client is written in the Python programming language using the wxWidgets[17] GUI library and runs on Windows and Linux systems. Data is uploaded using the rsync protocol[18] and the libraries and plugins required for this are included in the installation package.

As the MASTR-MS server-side component is written in the Python 2.7 programming language, any operating system that has Python 2.7 available for running web applications with a web server can run the application. In practice, the application has only been tested on the Linux operating system and the Apache web server. For installation, operating system packages are available in RPM format for CentOS 6.5. Similarly, as the DataSync Client is also written in Python 2.7 it can run on any operating system that has Python 2.7 available. However it is typically installed on a Windows platform with a connected analytical instrument. For this reason, the DataSync Client is distributed as a Windows executable (.exe) installer. The DataSync Client application is also self-updating by means of a user option to upgrade to a newer version if available.

Results

MASTR-MS is a web-based LIMS solution for metabolomics laboratories. The different modules of MASTR-MS allow users to:

  • Track all metabolomics samples and associated meta-, analytical- and processed data sets. This starts from the capture of client/collaborator communication; the establishment of new projects, experimental design and sample definitions; and the automatic capture of raw data generated by the instruments.
  • Develop an electronic notebook, where users record all relevant information about projects and experiments in MASTR-MS, thus allowing multiple users to work on the same project.
  • Methodically manage the vast amount of data generated by the analytical instruments, by associating it with the project, experiment and sample details.
  • Facilitate collaboration between geographically distributed laboratories through the sharing of projects and experiment data.

MASTR-MS is equally suited for use in either a large core facility or single-/multi-laboratory environment. Thus, both large national facilities and small individual laboratories would equally benefit from using MASTR-MS.

MASTR-MS comprises five major modules: (1) the Node Management System, (2) the User Management System, (3) the Quote Management System, (4) the Project Management System and (5) the Data Management System. Figure 1 shows the workflow of MASTR-MS using the different functionalities and features. These functions are described in detail below. The user is initially connected to the dashboard when they first log into MASTR-MS, and the available functions are tailored to the level of access of the user. The dashboard gives an “at-a-glance” summary of recent activity on the site and items requiring attention. Depending on the user’s status/level of access, the dashboard shows pending user requests, quotes requiring attention, recently created / modified projects, and recently created / modified experiments.


Fig1 Hunter Metabolomics2017 13-2.gif

Figure 1. Overview of MASTR-MS system workflow

Node management system

This module allows the addition of multiple laboratories to be part of a single MASTR-MS network. For example, a group of geographically dispersed laboratories can have a single deployment of MASTR-MS and share projects and experiments. Such a setup would be established by the module through the generation of different nodes. On the other hand, MASTR-MS can be used within a single laboratory environment in which this module would comprise a single node.

User management system

This module defines the different user groups used in MASTR-MS. Each user group has different privileges and permissions to access the different functionalities of MASTR-MS. In addition, this module allows the generation and management of users of the system. MASTR-MS has several user groups.

Systems administrator

This user group has access to all functionalities of MASTR-MS. There would normally be one assigned Systems Administrator who would act as the query point for all other users accessing the system, although it is possible to have more than one Systems Administrator. The Systems Administrator has a Laboratory Name assigned to their account (like all other users), allowing a nominated user, usually a member of the organization/laboratory that is hosting the project to act as the Systems Administrator. The Systems Administrator can add new users to the system, assign user groups to any users in any laboratory, edit details of users and delete users of any laboratory.

Administrator

This user group has full access to all projects, experiments and experimental data, user accounts and quotes within MASTR-MS, regardless of node. This user group allows selected users to view all projects and experiments across different nodes, allowing seamless sharing and collaboration of data across nodes. Where multiple laboratories have a single MASTR-MS deployment, but prefer not to share projects and experiments, no users would be assigned the Administrator role.

Node representative

This user group has full access to quotes for their node and are the preferred contact for quotes and projects run by this node (detailed more in the "Quote management system" section). In a multi-node setup there would typically be at least one user assigned to this group per node.

Project leader

This user group is able to create new projects and experiments for their node. Additionally, this group is able to assign staff to specific projects and experiments.

Staff

Users of this group are able to participate in the projects and experiments for their node.

Client

All other users of the system are clients. This group has no privileges other than viewing the progress of projects to which they have been assigned.

Any user of the system can update their own user record and change their password at any time.

Quote management system

This module was designed specifically for core facilities that provide metabolomic services to client researchers. Potential clients can request a pricing quote for running samples of an experiment through the quote request system without having to sign up for an account. At a nominated stage, clients are required to register in MASTR-MS by completing a short information dialog box. This module allows collection of contact details and information about the nature of the request. Files in various formats can be attached to this module. In a multi-node facility, the user can either direct their quote to a specific node with relevant expertise or they can select "Don’t Know" to have all the Node Representatives alerted.

Quote requests made by clients and collaborators that are made through the system are tracked and marked if they have not been attended to yet, so that Node Representatives can quickly see new quotes which require attention. Quotes can only be seen by members of the node to which they were sent, unless the "Don’t Know" option was selected. Node Representatives are able to forward quotes to other nodes if required. The Node Representatives can then begin a dialogue with the potential client and with their team, clarifying the task, and providing formal quotes, attached as PDFs if necessary. Each step of the communications process is time-stamped and tracked within this module. The quote requests and any resulting quotes would eventually be associated with a project and experiment through a selection option in the Experimental Design stage. All documentation relating to the project, including the client and quote issued for the project, along with the project and experimental setup, is thus kept together.

Project management system

This module allows the management of projects, experiments, and samples as well as the creation of analytical sample runs. As detailed above, users of different user groups are able to create projects and experiments. When a project is created by either a MASTR-MS Administrator or Project Leader, it can be linked to a specific client from the user list. This allows the client to monitor how the project is progressing. Assigning a Project Manager to the project allows those users to manage all aspects of a project, experiment creation and further access control on an experiment-by-experiment basis (Supplemental Fig. S4). As sample metadata is linked to all experiments within MASTR-MS, sample classes and/or individual samples can be organized into groups and subsequently analyzed on an instrument.

Experiment details

The Experiment Status defaults to "New" when first opened, and all experiment metadata is captured in this field (Supplemental Fig. S5A). Once the experiment design has been completed, the Project Manager can change the setting to "Designed" to prevent further changes. The experiment can also be linked to a formal quote that has been previously entered in the quotes system, and if needed, can be assigned an internal job number.

Access control/roles

Users can be assigned to an experiment, giving them access to edit the experimental workflow and create samples and runs. Client users can also be added here, giving them access to project progress information (Supplemental Fig. S5B).

Sample metadata

MASTR-MS uses sample metadata in order to generate sample classes, which can then be populated with individual samples (Supplemental Fig. S5C).

Origin/organs/parts metadata

The first metadata category is the Origin field, which contains information on sample origin and preparation (Supplemental Fig. S5D). Different metadata fields are available depending on whether the source is Microbial, Plant, Animal, Human, Synthetic, or Other.

Timeline/treatment metadata

MASTR-MS also accepts time course and treatment metadata, where samples have been collected over multiple time points, or after different experimental treatments. The Origin, Timeline, and Treatment fields are then used to automatically generate sample classes.

Sample preparation

MASTR-MS allows an upload of a standard operating procedure (SOP) document to be associated with an experiment. Multiple SOPs can be uploaded and additional notes recorded for each. A SOP is linked with methods used during runs at the time of setting up a run. The SOP is linked at the experiment level, and the option of choosing methods is provided under the runs level. This is to incorporate the option where a user would like to run multiple methods during a run (either by resampling the same vial or from a different vial).

Automatic sample class generation

Based on the metadata entered in the Origin, Timeline, and Treatment steps, sample classes are automatically generated based on permutations of the available metadata (Supplemental Fig. S7A). If abbreviations have been provided for a particular metadata category, these will be used during sample class generation. Samples can then be created in each sample class.

Samples can then be viewed and collected together to form a run on a designated analytical instrument platform (Supplemental Fig. S7B). Additional sample information can be imported via CSV and exported from MASTR-MS in the same way. Samples can be randomized before putting them into a run if desired.

Runs

Selected samples are added to a new or existing run by clicking the "Add Selected Samples to Run" button. This will display a dialog allowing the user to add either the samples to a new run or to any previous run which is still unlocked for editing (Supplemental Fig. S8A). Runs continue to be unlocked as long as a worklist has not yet been generated for them. Locked runs can be edited and reused if needed using the “Run Cloning” feature, which will duplicate the run data into a new unlocked run.

Work-list generation

The goal of run configuration is to streamline sample analysis and generate instrument worklists in a convenient and flexible manner. After sample data has been added to a run, the order and sequencing of additional run elements (Sweeps, Solvents, etc.) can be added via the Rules Generator.

The Rules Generator provides a customizable set of steps (rules) which dictate how work-lists are built. It consists of a Start Block, Sample Block, and End Block, each of which allows the insertion of non-sample components into the worklist. These include Pooled Biological QC, Sweep, Reagent Blank, Solvent Blank and Pure Standard.

The sample block, containing the experiment samples, allows n components to be inserted every m samples, in random or position order (Supplemental Fig. S8B). Once all three blocks have been designed, the Rule Generator can be enabled, disabling further editing and making the Rule available for inclusion in Run work-list generation. Rule Generators can be restricted to use by a single user, an entire node, or everybody on the system. Enabled Rule Generators can be cloned in order to generate a new version, which can then be extended and modified.

To generate a work-list within a run, the user selects an instrument (configured and made available by Administrators) and a Rule Generator, if needed, and clicks the "Generate Worklist" button. Once the worklist is generated, further modification of the run is not possible. The specific worklist format is customizable by site administrators to provide flexibility among various instrument models. Once the worklist is generated, it can be used with the instrument to automate the raw data collection process.

Availability and requirements

Project name: MASTR-MS

Project home page: https://muccg.github.io/mastr-ms/

Operating system(s): Server Installation: Centos 6.x (x86_64); Client: Any operating system and modern web browser can be used as the web client to access MASTR-MS; DataSync Client: Linux or Windows

Programming language: Python 2.7

Software requirements: Apache 2.2 or higher, PostgreSQL 8.4 or higher

License: GNU GPL v3

Any restrictions to use by non-academics: See GNU GPL v3

Acknowledgements

This project is supported by Bioplatforms Australia Ltd., the Australian National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund Super Science Initiative. The authors gratefully acknowledge additional funding from the Australian National Health and Medical Research Council (APP634485, APP1055319) and the EU FP7 Project (HEALTH.2012.2.1.1-1-C): RD Connect: An integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. MJM is a NHMRC Principal Research Fellow. AB acknowledges the support of the ARC Centre of Excellence in Plant Cell Walls. The authors acknowledge the many contributions made by other researchers in the Bioplatforms Australia network, including Michael Clarke, Hayden Walker, Dorothee Hayne, Robert Trengove and Catherine Rawlinson.

Adam Hunter, Saravanan Dayalan and David De Souza have contributed equally to this work.

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

Supplementary material 1 (JPG 137 KB)

Supplementary material 2 (PNG 48 KB)

Supplementary material 3 (PNG 6836 KB)

Supplementary material 4 (PNG 3593 KB)

Supplementary material 5 (PNG 2913 KB)

References

  1. "Metabolomics Australia". Metabolomics Australia. http://www.metabolomics.net.au/. Retrieved 05 December 2014. 
  2. "The Metabolomics Inovation Centre". University of Alberta. http://www.metabolomicscentre.ca/. Retrieved 05 December 2014. 
  3. "Metabolomics". The Common Fund. National Institutes of Health, Office of Strategic Coordination. http://commonfund.nih.gov/metabolomics/index. Retrieved 05 December 2014. 
  4. "MetaboHUB". MetaboHUB Centre INRA Bordeaux - Aquitaine. http://www.metabohub.fr/. Retrieved 05 December 2014. 
  5. 5.0 5.1 "Guidelines on FAIR Data Management in Horizon 2020 - Version 3.0" (PDF). European Commission. 26 July 2016. http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 05 December 2014. 
  6. "NIH Data Sharing Policy and Implementation Guidance". Grants and Funding. National Institutes of Health. 5 March 2003. https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm. Retrieved 05 December 2014. 
  7. "Guidance for researchers: Developing a data management and sharing plan". Policy and position statements. Wellcome Trust. 2014. Archived from the original on 18 October 2014. https://web-beta.archive.org/web/20141018165611/http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Guidance-for-researchers/index.htm. Retrieved 05 December 2014. 
  8. "Australian Code for the Responsible Conduct of Research". National Health and Medical Research Council, Australia. 2007. https://www.nhmrc.gov.au/guidelines-publications/r39. Retrieved 05 December 2014. 
  9. "Data Policies". Nature. Macmillan Publishers Limited. https://www.nature.com/sdata/policies/data-policies. Retrieved 05 December 2014. 
  10. "Instructions for authors: Research Articles". GigaScience. BioMed Central Ltd. 2014. Archived from the original on 15 May 2014. https://web.archive.org/web/20140515012736/http://www.gigasciencejournal.com:80/authors/instructions/research. Retrieved 05 December 2014. 
  11. Haug, K.; Salek, R.M.; Conesa, P. (2013). "MetaboLights--An open-access general-purpose repository for metabolomics studies and associated meta-data". Nucleic Acids Research 41 (D1): D781-6. doi:10.1093/nar/gks1004. PMC PMC3531110. PMID 23109552. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531110. 
  12. "Metabolics Workbench". University of California San Diego. http://www.metabolomicsworkbench.org/. Retrieved 05 December 2014. 
  13. "Python". Python Software Foundation. https://www.python.org/. Retrieved 05 December 2014. 
  14. "Django". Django Software Foundation. https://www.djangoproject.com/. Retrieved 05 December 2014. 
  15. "PostgreSQL". PostgreSQL Global Development Group. https://www.postgresql.org/. Retrieved 05 December 2014. 
  16. "MySQL". Oracle Corporation. https://www.mysql.com/. Retrieved 05 December 2014. 
  17. "wxWidgets". wxWidgets Development Team. https://www.wxwidgets.org/. Retrieved 10 November 2016. 
  18. "rsync". Wayne Davison. https://rsync.samba.org/. Retrieved 05 December 2014. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article's references were in alphabetical order; the references here are shown in order of appearance in the article due to the way the wiki processes references. In several cases the original URL had changed; an archived version of the URL was used instead. The supplemental figures mentioned in the text have a different name than the ones supplied at the end; it's thus unclear which supplementary files match which supplementary references.