Difference between revisions of "Journal:MendeLIMS: A web-based laboratory information management system for clinical genome sequencing"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Going to add text later.)
 
(→‎Notes: Added cat.)
 
(4 intermediate revisions by the same user not shown)
Line 19: Line 19:
|download    = [http://www.biomedcentral.com/content/pdf/1471-2105-15-290.pdf http://www.biomedcentral.com/content/pdf/1471-2105-15-290.pdf] (PDF)
|download    = [http://www.biomedcentral.com/content/pdf/1471-2105-15-290.pdf http://www.biomedcentral.com/content/pdf/1471-2105-15-290.pdf] (PDF)
}}
}}
{{ombox
 
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
==Abstract==
'''Background:''' Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate [[laboratory information management system]]s to manage the thousands of patient samples that are subject to this type of genetic analysis.
'''Background:''' Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate [[laboratory information management system]]s to manage the thousands of patient samples that are subject to this type of genetic analysis.
Line 83: Line 79:
|}
|}
|}
|}
The look and feel of the application is easily changed or customized since all web pages inherit styles from an application-wide cascading style sheet (CSS), and in keeping with RoR convention, overall page layout and navigation is controlled by a single HTML layout file.
===Sample nomenclature===
To enable accurate tracking of samples from their initial acquisition, through all key intermediate steps and ultimately to DNA sequencing, we developed a sample labeling nomenclature loosely based on the scheme employed by the Cancer Genome Atlas (TCGA) project (https://wiki.nci.nih.gov/display/TCGA/Working+with+TCGA+Data). We maintain the original unique sample barcode, and add successive suffixes to indicate processing performed.
===Acquisition of clinical study samples===
After enrollment into a study, patient samples and their characteristics are entered into MendeLIMS through a web interface (Figure 2). A unique identifier (ID) is assigned for each new clinical sample. The user has the option of entering sample-relevant clinical data including pathology information from clinical reports, digital images originating from pathology slides and general clinical information about the patient (Figure 3). For efficient subsequent retrieval of the physical samples, the storage freezer and container location is entered using a standard nomenclature. If email triggers are configured, an email is automatically sent to an identified central coordinator and/or to a specified owner for the particular clinical trial giving details of any new sample entered into the system. The web interface enables sample entry to occur at any location thus facilitating sample entry by various researchers and clinical coordinators.
[[File:Fig3 Grimes BMCBioinformatics2014 15.jpg|700px]]
{{clear}}
<blockquote>'''Figure 3: Workflow of MendeLIMS.''' Multiple steps of the sample acquisition workflow for clinical studies are<br />fully integrated with next generation or genomic assay procedures. This allows one to trace the genomic<br />analysis of any given sample.</blockquote>
===Clinical study sample assessment and processing===
Any manipulation of clinical study samples is tracked (Figure 3). This includes dissection of tissue samples and nucleic acid extraction. Details of these sample workflow operations are stored in MendeLIMS including volumes of any extracted macromolecules such as genomic DNA, concentration metrics and sample storage location. This greatly facilitates the managements of these precious resources for population studies.
===Tracking molecular and genomic analysis===
Molecular assays and sequencing library steps are also captured in MendeLIMS (Figure 3). Sequencing libraries may be entered as singleplex (e.g. one sample per library), or multiplex (e.g. multiple samples per library with each sample tagged with a unique starting sequence). The multiplex indexing schemes are user-configurable, both for number of samples which can be multiplexed on one lane, and for the actual starting sequences used.
===Tracking the next generation sequencing workflow===
In preparation for initiation of an NGS analysis, a sequencing run is entered into the system by selecting existing libraries and placing them into separate lanes or partitions. Normally an entire sequencing run is entered. However, the system is also able to handle partial sequencing runs to accommodate the scenario where sequencing may be performed as a service and the run is shared between multiple groups who are not privy to each other’s results. MendeLIMS generates a unique sequencing run key based on the sequencing date, sequencing machine, and a unique sequential run number. Once the sequencing run has completed, the initial quality control (QC) metrics for the run can be entered into the system. This is currently handled by an offline ruby script, but will in future be incorporated into the web application. MendeLIMS supports any type of sequencing application including whole genome, exome, targeted and RNA-based sequencing studies. The system stores sequencing library and sample lineage, flow cell composition, and sequencing run metadata, along with run status and QC metrics for all runs (Figure 4). The sequencing data files — for example bam alignment files, or vcf variant calling files — are not stored in MendeLIMS per se but are on a storage cluster accessible to all researchers in the group. Additionally, since we use the MendeLIMS sequencing run key and sequencing library/sample nomenclature in the analysis directory and file names, the files are easily cross-referenced between MendeLIMS and the storage cluster.
[[File:Fig4 Grimes BMCBioinformatics2014 15.jpg|1000px]]
{{clear}}
<blockquote>'''Figure 4: Tracking the sequencing of clinical samples.''' One can follow a clinical sample from enrollment in a clinical study all the way through<br />to its sequencing. For example, from the sequencing run composition one can back track to the individual libraries and the original source DNA.<br />Screen shots show the various levels of querying. A sequencing library can be queried for additional information. When required, it is possible to even<br />determine the time of enrollment in a study and pull up relevant images from pathology.</blockquote>
===Queries of MendeLIMS===
All queries allow specification of multiple filter criteria such as barcode range, date range, owner, protocol which enables users to quickly find the samples of interest, and then drill-down to more detail. For example, when viewing the sample query result set (Figure 2), clicking on the sample barcode will bring up more comprehensive information regarding the sample, including pathology information. Clicking on the ‘QC’ link from a sequencing library query result set shows QC data for all sequencing runs for that library. Query results may be exported to a tab-delimited file for review or for incorporation with other local data.
===Reagent tracking===
Reagents, equipment and supplies ordering, though not necessarily typical to a LIMS implementation, have been included in MendeLIMS. This feature enables for example the tracking of reagent and supply batches that is useful in troubleshooting failed sequencing runs, or the tracking of all expenditures against a specific funding account.
===Security===
Hypertext Transfer Protocol Secure (HTTPS) is supported and is currently implemented for user login pages, but is easily extended to other pages as needed. User authentication is via a userid and password, and access to functionality is controlled via user roles which are defined and managed from the website by a user with ‘admin’ role. Other roles available include ‘clinical’ which allows create/modify access to clinical study sample information, ‘clin_admin’ which allows modification to drop-down lists used for system validation for sample data, ‘researcher’ which allows create/modify access to sequencing libraries and sequencing runs. The user manual in the supplementary material provides descriptions of all available user roles.
Given the extreme complexity of dealing with private health information (PHI), MendeLIMS is not designed to incorporate PHI-related clinical data. MendeLIMS does store a patient identifier that is the link between MendeLIMS and other patient clinical information databases that are securely stored in a very limited access environment. The identifier is stored as a binary encrypted field in the MySQL database and access to this field via the web application is limited to users with a ‘clinical’ or ‘clin_admin’role; other users only see a unique system-generated patient identifier which for all intents and purposes is anonymous.
In our current implementation used by several groups at Stanford University, MendeLIMS is integrated into an internal network, within a secured firewall. All database transactions are logged and time-stamped to provide an audit trail, and automated database backups are run daily. An administrator can readily generate an audit report to keep track of changes made by users.
==Discussion==
There are commercial LIMS solutions available for NGS labs, some of which have been implemented at major genomic research centers. For example GeneSifter LAB Edition<ref name="GSLABEd">{{cite web |url=http://www.geospiza.com/Products/LabEdition.shtml |title=GeneSifter Lab Edition |publisher=PerkinElmer, Inc}}</ref> has been implemented at Vanderbilt University; [[Progeny Software, LLC|Progeny LIMS]]<ref name="Progeny">{{cite web |url=http://www.progenygenetics.com/lims/ |title=Progeny LIMS |publisher=Progeny Genetics, LLC}}</ref> at Pittsburg University and [[GenoLogics Life Sciences Software Inc.|Clarity LIMS]]<ref name="Clarity">{{cite web |url=http://www.genologics.com/clarity-lims/ |title=Clarity LIMS |publisher=GenoLogics Life Sciences Software, Inc}}</ref> at University of Washington. These systems have significant capabilities. However, the cost in time and money to implement them is often out of reach for smaller organizations, particularly those who rely on funding from research grants or who require unique workflows that can not be implemented readily in a system designed for a larger institution. Given these resource constraints, open source options are of greater interest to this category of organizations.
There are several simpler LIMS systems covering clinical study samples such as [[BonsaiLIMS]]<ref name="BathBons11">{{cite journal |title=LimsPortal and BonsaiLIMS: development of a lab information management system for translational medicine |journal=Source Code for Biology and Medicine |author=Bath, Timothy G.; Bozdag, Selcuk; Afzal, Vackar; Crowther, Daniel |volume=6 |pages=9 |year=2011 |doi=10.1186/1751-0473-6-9 |pmid=21569484 |pmc=PMC3113716}}</ref>, PASSIM<ref name="ViksnaPASS07">{{cite journal |title=PASSIM: An open source software system for managing information in biomedical studies |journal=BMC Bioinformatics |author=Viksna, J.; Celms, E.; Opmanis, M.; Podnieks, K.; Rucevskis, P.; Zarins, A.; Barrett, A.; Neogi, S.G.; Krestyaninova, M.; McCarthy, M.I.; Brazma, A.; Sarkans, U. |volume=8 |pages=52 |year=2007 |doi=10.1186/1471-2105-8-52 |pmid=17291344 |pmc=PMC1803798}}</ref> and SLIMS.<ref name="VanRossumSLIMS10">{{cite journal |title=SLIMS: A user-friendly sample operations and inventory management system for genotyping labs |journal=Bioinformatics |author=Van Rossum, T.; Tripp, B.; Daley, D. |volume=26 |issue=14 |pages=1808-1810 |year=2010 |doi=10.1093/bioinformatics/btq271 |pmid=20513665 |pmc=PMC2894515}}</ref> These LIMS offer basic sample management but do not offer comprehensive clinical sample tracking or the ability to define sequencing libraries and flow cell/sequencing run composition for NGS processing. More recent offerings which are available for open source installation and do support NGS processing include [[Galaxy (biomedical software)|Galaxy LIMS]]<ref name="ScholtalbersGal13">{{cite journal |title=Galaxy LIMS for next-generation sequencing |journal=Bioinformatics |author=Scholtalbers, J.; Rossler, J.; Sorn, P.; de Graaf, J.; Boisguerin, V.; Castle, J.; Sahin, U. |volume=29 |issue=9 |pages=1233-1234 |year=2013 |doi=10.1093/bioinformatics/btt115 |pmid=23479349}}</ref> and [[GNomEx|GNomEx LIMS]].<ref name="NixNext10">{{cite journal |title=Next generation tools for genomic data generation, distribution, and visualization |journal=BMC Bioinformatics |author=Nix, David A.; Di Sera, Tonya L.; Dalley, Brian K.; Milash, Brett A.; Cundick, Robert M.; Quinn, Kevin S.; Courdy, Samir J. |volume=11 |pages=455 |year=2010 |doi=10.1186/1471-2105-11-455 |pmid=20828407 |pmc=PMC2944281}}</ref> These systems address the flow from DNA/RNA extraction to sequencing library to flow cell/sequencing run. GNomEx LIMS also provides some analysis workflow capability and integrates data visualization via genome browsers such as UCSC Genome Browser<ref name="KentTheHuman02">{{cite journal |title=The human genome browser at UCSC |journal=Genome Research |author=Kent, W.J.; Sugnet, C.W; Furey, T.S.; Roskin, K.M.; Pringle, T.H.; Zahler, A.M.; Haussler, D. |volume=12 |issue=6 |pages=996-1006 |year=2002 |doi=10.1101/gr.229102 |pmid=12045153 |pmc=PMC186604}}</ref> or Integrative Genomics Viewer (IGV).<ref name="ThorvaldsdottirInt13">{{cite journal |title=Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration |journal=Briefings in Bioinformatics |author=Thorvaldsdottir, H.; Robinson, J.T.; Mesirov, J.P. |volume=14 |issue=2 |pages=178-192 |year=2013 |doi=10.1093/bib/bbs017 |pmid=22517427 |pmc=PMC3603213}}</ref> Galaxy LIMS takes advantage of the Galaxy infrastructure to also provide analysis workflow and data visualization. However, none of these systems natively provide tracking of clinical study data such as consent protocols, pathology and histopathology information. Sample lineage and sample tracking via consistent nomenclature, drill down to various levels of source data, and freezer container/location information is also not addressed.
Another open source option is QTREDS.<ref name="PallaQT14">{{cite journal |title=QTREDS: A Ruby on Rails-based platform for omics laboratories |journal=BMC Bioinformatics |author=Palla, P.; Frau, G.; Vargiu, L.; Rodriguez-Tome, P. |volume=15 |issue=Suppl 1 |pages=S13 |year=2014 |doi=10.1186/1471-2105-15-S1-S13 |pmid=24564791 |pmc=PMC4015218}}</ref> This LIMS has a strong focus on experimental protocols for sample preparation and tracks detailed steps which MendeLIMS and other systems do not specifically track such as sonication, end repair or ligation as part of exome library preparation. QTREDS also manages inventory of reagents for sample preparation and triggers low stock level alerts. However there is no tracking associated with clinical study samples, and NGS support is limited to the sequencing libraries and their associated sequencing status, rather than flow cell composition and sequencing run itself. In contrast, MendeLIMS provides full sample lineage tracking back to patient and the clinic and consent protocol where the sample originated, as well as support for all major processing steps through to the DNA/RNA sequencing and QC (Table 1). Additional functionality that is useful for tracking is data related to which cluster and sequencing kit versions were used for a particular run, and what publications (if any) reference the results from that run.
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="50%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="5"|'''Table 1.''' Comparison among different LIMS systems
|-
  ! LIMS software
  ! MendeLIMS
  ! GNomEx
  ! Galaxy LIMS
  ! QTREDS
|-
  | style="background-color:white; padding-left:10px; padding-right:10px; font-weight:bold;"  colspan="5"|''Clinical study patient samples''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Patient data (gender/race, MRN, pathology, histology)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Sample processing (dissections, extractions)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>a</sup>
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Sample location tracking
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px; font-weight:bold;"  colspan="5"|''Arrays, libraries, sequencing runs''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Molecular assays (genomic arrays, ddPCR, ..)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Sequencing library prep (singleplex and multiplex)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>b</sup>
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Flow cell/sequencing run setup
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
|-
  | style="background-color:white; padding-left:10px; padding-right:10px; font-weight:bold;"  colspan="5"|''Post-sequencing analysis''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Sequencing QC
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|''nd''
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>c</sup>
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Analysis workflow
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>c</sup>
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
|-
  | style="background-color:white; padding-left:10px; padding-right:10px; font-weight:bold;"  colspan="5"|''Security/Audit trail''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Authorization via user roles
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Audit trail
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|''nd''
  | style="background-color:white; padding-left:10px; padding-right:10px;"|''nd''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|HTTPS/SSL security
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|''nd''
  | style="background-color:white; padding-left:10px; padding-right:10px;"|''nd''
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px; font-weight:bold;"  colspan="5"|''General''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Email/notification capability
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>c</sup>
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Customizable lists for data validation
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Instrument integration
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>d</sup>
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Attach files to samples/libraries/sequencing runs
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|''nd''
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|''nd''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Visualization of results
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>e</sup>
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes<sup>c</sup>
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
|-
  | style="background-color:white; padding-left:10px; padding-right:10px; font-weight:bold;"  colspan="5"|''Other''
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Project based billing
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Reagent inventory management
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Publicly available User Guide/Demos
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
  | style="background-color:white; padding-left:10px; padding-right:10px;"|No
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Yes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="9"|<sup>a)</sup> Detailed information tracked regarding sample prep; <sup>b)</sup> Singleplex libraries only; <sup>c)</sup> Functionality provided via integration with Galaxy and genome browsers; <sup>d)</sup> Integration with HiSeq 2000 only; <sup>e)</sup> Functionality provided via integration with GenoPub.
''nd'' - Indicates that unable to determine from public documentation whether the functionality is provided.
|-
|}
|}
==Conclusions==
Clinical population studies using NGS sequencing require management of hundreds if not thousands of samples, including various intermediate processing steps, and the resulting sequencing data. A LIMS system is critical to the effective management of this data and the generation of reproducible results. The currently available open source or commercial systems may meet the needs of some research groups, however for those groups where the time and monetary cost of a comprehensive commercial system is prohibitive, there is no end-to-end open source solution covering enrollment of patients in a clinical study through genome sequencing analysis. Our system addresses all of these needs with a specific focus and seamless integration of clinical study enrollment through to NGS.
In MendeLIMS, all data is consolidated into one authoritative centrally accessible source repository, eliminating multiple distributed spreadsheets. Samples are traceable from a lane on a sequencing run, back to the patient diagnosis, pathology, and all processing steps in between. In conjunction with a standard barcoding nomenclature and flexible query capability, this significantly reduces errors in sample tracking, provides a comprehensive view of data being sequenced and has resulted in MendeLIMS becoming an invaluable tool for the management of our clinical sequencing studies.
==Availability and requirements==
'''Project name:''' MendeLIMS
'''Project home page:''' http://dna-discovery.stanford.edu/software/mendelims/
'''Project demo site:''' http://mendelims.stanford.edu/
'''Operating system(s):''' Platform independent
'''Programming Language(s):''' Ruby, Ruby on Rails, HTML, Javascript
'''Server requirements:''' Apache2, Mongrel or Passenger, Ruby 1.9.3+, Rails 3.2.x, MySQL 5.0
'''Web browser requirements:''' Firefox, Chrome, IE, Safari
'''License:''' Any restrictions to use by non-academics: None
==Abbreviations==
NGS: Next generation sequencing
LIMS: Laboratory information management system
RoR: Ruby on rails
RDBMS: Relational database management system
CSS: Cascading sheet style
TCGA: The cancer genome atlas
ID: Identifier
QC: Quality control
HTTPS: Hypertext transfer protocol secure
PHI: Private health information
IGV: Integrated genome viewer
==Competing interests==
The authors declare that they have no competing interests.
==Authors’ contributions==
SMG and HPJ drafted the manuscript. SMG wrote the software, setup the website implementation and wrote the user guide. SMG and HPJ conceived and designed the project. Both authors read and approved the final manuscript.
==Acknowledgements==
We would like to acknowledge Rebecca Kim for help with design for sample acquisition and processing; Jason Buenrostro and Erik Hopmans for help with design for sequencing libraries and flow cells; Jennifer Palm for her assistance with graphics and Curt Palm for his help with server configuration for the MendeLIMS demo site.
This work was supported by the following grants from the NIH: RC2HG005570 to S.M.G and H.P.J., DK56339 to H.P.J, 2P01HG000205 to S.M.G. and H.P.J. HPJ was supported by a Research Scholar Grant (RSG-13-297-01-TBG) from the American Cancer Society. In addition, H.P.J. received support from the Doris Duke Clinical Foundation, the Clayville Foundation, the Liu BieJu Cha and Family Fellowship in Cancer and the Howard Hughes Medical Foundation.


==References==
==References==
Line 91: Line 333:


<!--Place all category tags here-->
<!--Place all category tags here-->
[[Category:LIMSwiki journal articles (added in 2015)‎]]
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles on bioinformatics‎‎]]
[[Category:LIMSwiki journal articles on bioinformatics‎‎]]
[[Category:LIMSwiki journal articles on software‎]]

Latest revision as of 20:58, 28 December 2015

Full article title MendeLIMS: A web-based laboratory information management system for clinical genome sequencing
Journal BMC Bioinformatics
Author(s) Grimes, Susan M.; Ji, Hanlee P.
Author affiliation(s) Stanford University
Primary contact Email: genomics_ji@stanford.edu
Year published 2014
Volume and issue 15
Page(s) 290
DOI 10.1186/1471-2105-15-290
ISSN 1471-2105
Distribution license Creative Commons Attribution 2.0 Generic
Website http://www.biomedcentral.com/1471-2105/15/290
Download http://www.biomedcentral.com/content/pdf/1471-2105-15-290.pdf (PDF)

Abstract

Background: Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis.

Results: To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies.

Conclusions: We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

Keywords: Next generation sequencing; Clinical studies; Laboratory information management; Pathology; Genomics; Genetics

Background

With next generation DNA sequencing (NGS) now being a commonly adopted technology, the genetic analysis of large clinical populations has become practical and is widely used for identifying disease-related germline and somatic variants such as cancer mutations. The genetic variation from thousands of individuals can now be identified with NGS whole genome, exome, targeted and other resequencing approaches. Due to the dramatic increase in the number of NGS clinical genomics studies, it has become increasingly important to develop adequate laboratory information managements systems (LIMS) to manage the thousands of patient samples that are subject to NGS analysis. Tracking and managing the clinical sample workflow involved in NGS analysis is an extremely difficult task, given the logistical issues of enrolling patients, fragmented procedures for acquisition of clinical study samples, complex molecular preparation steps and the intricacies of the NGS processing pipeline. Commercial systems are available but typically are high cost and require extensive modification to address the specific needs of biomedical research groups conducting genetic analysis on populations.

As a general and unique solution to the needs of managing the experimental workflow for clinical genome sequencing projects, we developed MendeLIMS, a web-based, robust and flexible solution for integrating the management of clinical study samples and NGS processes. With respect to genetic studies, MendeLIMS functionality can be grouped into four major categories: (i) enrollment of patients and acquisition of clinical study samples, (ii) sample assessment and processing, (iii) genomic analysis through preparation of next generation DNA sequencing libraries or other molecular assays such as microarrays and finally, (iv) DNA sequencing of samples with associated quality control metrics. Tracking of sequencing steps is currently supported for the following Illumina NGS instruments: GAIIx, MiSeq, HiSeq, HiSeq2500, NextSeq but can easily be configured for any type of NGS instrument which follows a sequencing library to flow cell workflow. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu.

Implementation

MendeLIMS is written in Ruby using the open source web application framework Ruby on Rails (RoR) and implementation is platform-independent. Instructions for installation are provided in Additional file 1. For our own in-house instance of MendeLIMS, our servers run Linux/Ubuntu and we use the MySQL relational database management system (RDBMS). The application is easily configured to use any other SQL RDBMS supported by RoR. Figure 1 shows a simplified database schema for the major tables. A more comprehensive schema is provided in Additional file 2.

Additional file 1. An installation guide for MendeLIMS.
Format: PDF; Size: 1,016KB Download file

Fig1 Grimes BMCBioinformatics2014 15.jpg

Figure 1: Database schema for MendeLIMS. Main entities and their relationships are shown in this diagram,
and a complete schema showing other ancillary tables is provided in the supplementary material.

Additional file 2. MendeLIMS database schema diagram.
Format: PDF; Size: 54KB Download file

The web interface is designed to handle a variety of queries in a modular format (Figure 2). To facilitate consistent data entry, MendeLIMS uses drop-down lists for seamless data validation whenever possible. The drop-down lists themselves are user-configurable by users with the appropriate authorization. Examples of user-configurable items include sample types, sequencing library multiplexing schemes, alignment references and DNA sequencers. All of the features are described in the user’s manual (Additional file 3).

Fig2 Grimes BMCBioinformatics2014 15.jpg

Figure 2: Query web interfaces for MendeLIMS. Database queries are managed by a series of web pages that
have a modular format. Different search parameters are included with drop down menus used for standardized search
terminology. Based on the needs of any given group, the search interface can be easily modified to accommodate
new search or entry functions.

Additional file 3. User’s guide for MendeLIMS.
Format: PDF; Size: 338KB Download file

The look and feel of the application is easily changed or customized since all web pages inherit styles from an application-wide cascading style sheet (CSS), and in keeping with RoR convention, overall page layout and navigation is controlled by a single HTML layout file.

Sample nomenclature

To enable accurate tracking of samples from their initial acquisition, through all key intermediate steps and ultimately to DNA sequencing, we developed a sample labeling nomenclature loosely based on the scheme employed by the Cancer Genome Atlas (TCGA) project (https://wiki.nci.nih.gov/display/TCGA/Working+with+TCGA+Data). We maintain the original unique sample barcode, and add successive suffixes to indicate processing performed.

Acquisition of clinical study samples

After enrollment into a study, patient samples and their characteristics are entered into MendeLIMS through a web interface (Figure 2). A unique identifier (ID) is assigned for each new clinical sample. The user has the option of entering sample-relevant clinical data including pathology information from clinical reports, digital images originating from pathology slides and general clinical information about the patient (Figure 3). For efficient subsequent retrieval of the physical samples, the storage freezer and container location is entered using a standard nomenclature. If email triggers are configured, an email is automatically sent to an identified central coordinator and/or to a specified owner for the particular clinical trial giving details of any new sample entered into the system. The web interface enables sample entry to occur at any location thus facilitating sample entry by various researchers and clinical coordinators.

Fig3 Grimes BMCBioinformatics2014 15.jpg

Figure 3: Workflow of MendeLIMS. Multiple steps of the sample acquisition workflow for clinical studies are
fully integrated with next generation or genomic assay procedures. This allows one to trace the genomic
analysis of any given sample.

Clinical study sample assessment and processing

Any manipulation of clinical study samples is tracked (Figure 3). This includes dissection of tissue samples and nucleic acid extraction. Details of these sample workflow operations are stored in MendeLIMS including volumes of any extracted macromolecules such as genomic DNA, concentration metrics and sample storage location. This greatly facilitates the managements of these precious resources for population studies.

Tracking molecular and genomic analysis

Molecular assays and sequencing library steps are also captured in MendeLIMS (Figure 3). Sequencing libraries may be entered as singleplex (e.g. one sample per library), or multiplex (e.g. multiple samples per library with each sample tagged with a unique starting sequence). The multiplex indexing schemes are user-configurable, both for number of samples which can be multiplexed on one lane, and for the actual starting sequences used.

Tracking the next generation sequencing workflow

In preparation for initiation of an NGS analysis, a sequencing run is entered into the system by selecting existing libraries and placing them into separate lanes or partitions. Normally an entire sequencing run is entered. However, the system is also able to handle partial sequencing runs to accommodate the scenario where sequencing may be performed as a service and the run is shared between multiple groups who are not privy to each other’s results. MendeLIMS generates a unique sequencing run key based on the sequencing date, sequencing machine, and a unique sequential run number. Once the sequencing run has completed, the initial quality control (QC) metrics for the run can be entered into the system. This is currently handled by an offline ruby script, but will in future be incorporated into the web application. MendeLIMS supports any type of sequencing application including whole genome, exome, targeted and RNA-based sequencing studies. The system stores sequencing library and sample lineage, flow cell composition, and sequencing run metadata, along with run status and QC metrics for all runs (Figure 4). The sequencing data files — for example bam alignment files, or vcf variant calling files — are not stored in MendeLIMS per se but are on a storage cluster accessible to all researchers in the group. Additionally, since we use the MendeLIMS sequencing run key and sequencing library/sample nomenclature in the analysis directory and file names, the files are easily cross-referenced between MendeLIMS and the storage cluster.

Fig4 Grimes BMCBioinformatics2014 15.jpg

Figure 4: Tracking the sequencing of clinical samples. One can follow a clinical sample from enrollment in a clinical study all the way through
to its sequencing. For example, from the sequencing run composition one can back track to the individual libraries and the original source DNA.
Screen shots show the various levels of querying. A sequencing library can be queried for additional information. When required, it is possible to even
determine the time of enrollment in a study and pull up relevant images from pathology.

Queries of MendeLIMS

All queries allow specification of multiple filter criteria such as barcode range, date range, owner, protocol which enables users to quickly find the samples of interest, and then drill-down to more detail. For example, when viewing the sample query result set (Figure 2), clicking on the sample barcode will bring up more comprehensive information regarding the sample, including pathology information. Clicking on the ‘QC’ link from a sequencing library query result set shows QC data for all sequencing runs for that library. Query results may be exported to a tab-delimited file for review or for incorporation with other local data.

Reagent tracking

Reagents, equipment and supplies ordering, though not necessarily typical to a LIMS implementation, have been included in MendeLIMS. This feature enables for example the tracking of reagent and supply batches that is useful in troubleshooting failed sequencing runs, or the tracking of all expenditures against a specific funding account.

Security

Hypertext Transfer Protocol Secure (HTTPS) is supported and is currently implemented for user login pages, but is easily extended to other pages as needed. User authentication is via a userid and password, and access to functionality is controlled via user roles which are defined and managed from the website by a user with ‘admin’ role. Other roles available include ‘clinical’ which allows create/modify access to clinical study sample information, ‘clin_admin’ which allows modification to drop-down lists used for system validation for sample data, ‘researcher’ which allows create/modify access to sequencing libraries and sequencing runs. The user manual in the supplementary material provides descriptions of all available user roles.

Given the extreme complexity of dealing with private health information (PHI), MendeLIMS is not designed to incorporate PHI-related clinical data. MendeLIMS does store a patient identifier that is the link between MendeLIMS and other patient clinical information databases that are securely stored in a very limited access environment. The identifier is stored as a binary encrypted field in the MySQL database and access to this field via the web application is limited to users with a ‘clinical’ or ‘clin_admin’role; other users only see a unique system-generated patient identifier which for all intents and purposes is anonymous.

In our current implementation used by several groups at Stanford University, MendeLIMS is integrated into an internal network, within a secured firewall. All database transactions are logged and time-stamped to provide an audit trail, and automated database backups are run daily. An administrator can readily generate an audit report to keep track of changes made by users.

Discussion

There are commercial LIMS solutions available for NGS labs, some of which have been implemented at major genomic research centers. For example GeneSifter LAB Edition[1] has been implemented at Vanderbilt University; Progeny LIMS[2] at Pittsburg University and Clarity LIMS[3] at University of Washington. These systems have significant capabilities. However, the cost in time and money to implement them is often out of reach for smaller organizations, particularly those who rely on funding from research grants or who require unique workflows that can not be implemented readily in a system designed for a larger institution. Given these resource constraints, open source options are of greater interest to this category of organizations.

There are several simpler LIMS systems covering clinical study samples such as BonsaiLIMS[4], PASSIM[5] and SLIMS.[6] These LIMS offer basic sample management but do not offer comprehensive clinical sample tracking or the ability to define sequencing libraries and flow cell/sequencing run composition for NGS processing. More recent offerings which are available for open source installation and do support NGS processing include Galaxy LIMS[7] and GNomEx LIMS.[8] These systems address the flow from DNA/RNA extraction to sequencing library to flow cell/sequencing run. GNomEx LIMS also provides some analysis workflow capability and integrates data visualization via genome browsers such as UCSC Genome Browser[9] or Integrative Genomics Viewer (IGV).[10] Galaxy LIMS takes advantage of the Galaxy infrastructure to also provide analysis workflow and data visualization. However, none of these systems natively provide tracking of clinical study data such as consent protocols, pathology and histopathology information. Sample lineage and sample tracking via consistent nomenclature, drill down to various levels of source data, and freezer container/location information is also not addressed.

Another open source option is QTREDS.[11] This LIMS has a strong focus on experimental protocols for sample preparation and tracks detailed steps which MendeLIMS and other systems do not specifically track such as sonication, end repair or ligation as part of exome library preparation. QTREDS also manages inventory of reagents for sample preparation and triggers low stock level alerts. However there is no tracking associated with clinical study samples, and NGS support is limited to the sequencing libraries and their associated sequencing status, rather than flow cell composition and sequencing run itself. In contrast, MendeLIMS provides full sample lineage tracking back to patient and the clinic and consent protocol where the sample originated, as well as support for all major processing steps through to the DNA/RNA sequencing and QC (Table 1). Additional functionality that is useful for tracking is data related to which cluster and sequencing kit versions were used for a particular run, and what publications (if any) reference the results from that run.

Table 1. Comparison among different LIMS systems
LIMS software MendeLIMS GNomEx Galaxy LIMS QTREDS
Clinical study patient samples
Patient data (gender/race, MRN, pathology, histology) Yes No No No
Sample processing (dissections, extractions) Yes No No Yesa
Sample location tracking Yes No No Yes
Arrays, libraries, sequencing runs
Molecular assays (genomic arrays, ddPCR, ..) Yes Yes No Yes
Sequencing library prep (singleplex and multiplex) Yes Yes Yes Yesb
Flow cell/sequencing run setup Yes Yes Yes No
Post-sequencing analysis
Sequencing QC Yes nd Yesc No
Analysis workflow No Yes Yesc No
Security/Audit trail
Authorization via user roles Yes Yes Yes Yes
Audit trail Yes Yes nd nd
HTTPS/SSL security Yes nd nd Yes
General
Email/notification capability Yes Yes Yesc Yes
Customizable lists for data validation Yes Yes Yes Yes
Instrument integration No No Yesd No
Attach files to samples/libraries/sequencing runs Yes nd No nd
Visualization of results No Yese Yesc No
Other
Project based billing No Yes No No
Reagent inventory management No No No Yes
Publicly available User Guide/Demos Yes Yes No Yes
a) Detailed information tracked regarding sample prep; b) Singleplex libraries only; c) Functionality provided via integration with Galaxy and genome browsers; d) Integration with HiSeq 2000 only; e) Functionality provided via integration with GenoPub.

nd - Indicates that unable to determine from public documentation whether the functionality is provided.

Conclusions

Clinical population studies using NGS sequencing require management of hundreds if not thousands of samples, including various intermediate processing steps, and the resulting sequencing data. A LIMS system is critical to the effective management of this data and the generation of reproducible results. The currently available open source or commercial systems may meet the needs of some research groups, however for those groups where the time and monetary cost of a comprehensive commercial system is prohibitive, there is no end-to-end open source solution covering enrollment of patients in a clinical study through genome sequencing analysis. Our system addresses all of these needs with a specific focus and seamless integration of clinical study enrollment through to NGS.

In MendeLIMS, all data is consolidated into one authoritative centrally accessible source repository, eliminating multiple distributed spreadsheets. Samples are traceable from a lane on a sequencing run, back to the patient diagnosis, pathology, and all processing steps in between. In conjunction with a standard barcoding nomenclature and flexible query capability, this significantly reduces errors in sample tracking, provides a comprehensive view of data being sequenced and has resulted in MendeLIMS becoming an invaluable tool for the management of our clinical sequencing studies.

Availability and requirements

Project name: MendeLIMS

Project home page: http://dna-discovery.stanford.edu/software/mendelims/

Project demo site: http://mendelims.stanford.edu/

Operating system(s): Platform independent

Programming Language(s): Ruby, Ruby on Rails, HTML, Javascript

Server requirements: Apache2, Mongrel or Passenger, Ruby 1.9.3+, Rails 3.2.x, MySQL 5.0

Web browser requirements: Firefox, Chrome, IE, Safari

License: Any restrictions to use by non-academics: None

Abbreviations

NGS: Next generation sequencing

LIMS: Laboratory information management system

RoR: Ruby on rails

RDBMS: Relational database management system

CSS: Cascading sheet style

TCGA: The cancer genome atlas

ID: Identifier

QC: Quality control

HTTPS: Hypertext transfer protocol secure

PHI: Private health information

IGV: Integrated genome viewer

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SMG and HPJ drafted the manuscript. SMG wrote the software, setup the website implementation and wrote the user guide. SMG and HPJ conceived and designed the project. Both authors read and approved the final manuscript.

Acknowledgements

We would like to acknowledge Rebecca Kim for help with design for sample acquisition and processing; Jason Buenrostro and Erik Hopmans for help with design for sequencing libraries and flow cells; Jennifer Palm for her assistance with graphics and Curt Palm for his help with server configuration for the MendeLIMS demo site.

This work was supported by the following grants from the NIH: RC2HG005570 to S.M.G and H.P.J., DK56339 to H.P.J, 2P01HG000205 to S.M.G. and H.P.J. HPJ was supported by a Research Scholar Grant (RSG-13-297-01-TBG) from the American Cancer Society. In addition, H.P.J. received support from the Doris Duke Clinical Foundation, the Clayville Foundation, the Liu BieJu Cha and Family Fellowship in Cancer and the Howard Hughes Medical Foundation.

References

  1. "GeneSifter Lab Edition". PerkinElmer, Inc. http://www.geospiza.com/Products/LabEdition.shtml. 
  2. "Progeny LIMS". Progeny Genetics, LLC. http://www.progenygenetics.com/lims/. 
  3. "Clarity LIMS". GenoLogics Life Sciences Software, Inc. http://www.genologics.com/clarity-lims/. 
  4. Bath, Timothy G.; Bozdag, Selcuk; Afzal, Vackar; Crowther, Daniel (2011). "LimsPortal and BonsaiLIMS: development of a lab information management system for translational medicine". Source Code for Biology and Medicine 6: 9. doi:10.1186/1751-0473-6-9. PMC PMC3113716. PMID 21569484. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113716. 
  5. Viksna, J.; Celms, E.; Opmanis, M.; Podnieks, K.; Rucevskis, P.; Zarins, A.; Barrett, A.; Neogi, S.G.; Krestyaninova, M.; McCarthy, M.I.; Brazma, A.; Sarkans, U. (2007). "PASSIM: An open source software system for managing information in biomedical studies". BMC Bioinformatics 8: 52. doi:10.1186/1471-2105-8-52. PMC PMC1803798. PMID 17291344. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1803798. 
  6. Van Rossum, T.; Tripp, B.; Daley, D. (2010). "SLIMS: A user-friendly sample operations and inventory management system for genotyping labs". Bioinformatics 26 (14): 1808-1810. doi:10.1093/bioinformatics/btq271. PMC PMC2894515. PMID 20513665. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2894515. 
  7. Scholtalbers, J.; Rossler, J.; Sorn, P.; de Graaf, J.; Boisguerin, V.; Castle, J.; Sahin, U. (2013). "Galaxy LIMS for next-generation sequencing". Bioinformatics 29 (9): 1233-1234. doi:10.1093/bioinformatics/btt115. PMID 23479349. 
  8. Nix, David A.; Di Sera, Tonya L.; Dalley, Brian K.; Milash, Brett A.; Cundick, Robert M.; Quinn, Kevin S.; Courdy, Samir J. (2010). "Next generation tools for genomic data generation, distribution, and visualization". BMC Bioinformatics 11: 455. doi:10.1186/1471-2105-11-455. PMC PMC2944281. PMID 20828407. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2944281. 
  9. Kent, W.J.; Sugnet, C.W; Furey, T.S.; Roskin, K.M.; Pringle, T.H.; Zahler, A.M.; Haussler, D. (2002). "The human genome browser at UCSC". Genome Research 12 (6): 996-1006. doi:10.1101/gr.229102. PMC PMC186604. PMID 12045153. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC186604. 
  10. Thorvaldsdottir, H.; Robinson, J.T.; Mesirov, J.P. (2013). "Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration". Briefings in Bioinformatics 14 (2): 178-192. doi:10.1093/bib/bbs017. PMC PMC3603213. PMID 22517427. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3603213. 
  11. Palla, P.; Frau, G.; Vargiu, L.; Rodriguez-Tome, P. (2014). "QTREDS: A Ruby on Rails-based platform for omics laboratories". BMC Bioinformatics 15 (Suppl 1): S13. doi:10.1186/1471-2105-15-S1-S13. PMC PMC4015218. PMID 24564791. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015218. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.