Difference between revisions of "Journal:Personalized Oncology Suite: Integrating next-generation sequencing data and whole-slide bioimages"

From LIMSWiki
Jump to navigationJump to search
(Finished adding content.)
Line 19: Line 19:
|download    = [http://www.biomedcentral.com/content/pdf/1471-2105-15-306.pdf http://www.biomedcentral.com/content/pdf/1471-2105-15-306.pdf] (PDF)
|download    = [http://www.biomedcentral.com/content/pdf/1471-2105-15-306.pdf http://www.biomedcentral.com/content/pdf/1471-2105-15-306.pdf] (PDF)
}}
}}
{{ombox
 
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
==Abstract==
'''Background:''' Cancer immunotherapy has recently entered a remarkable renaissance phase with the approval of several agents for treatment. Cancer treatment platforms have demonstrated profound tumor regressions including complete cure in patients with metastatic cancer. Moreover, technological advances in next-generation sequencing (NGS) as well as the development of devices for scanning whole-slide bioimages from tissue sections and image analysis software for quantitation of tumor-infiltrating lymphocytes (TILs) allow, for the first time, the development of personalized cancer immunotherapies that target patient specific mutations. However, there is currently no [[bioinformatics]] solution that supports the integration of these heterogeneous datasets.
'''Background:''' Cancer immunotherapy has recently entered a remarkable renaissance phase with the approval of several agents for treatment. [[Cancer informatics|Cancer treatment platforms]] have demonstrated profound tumor regressions including complete cure in patients with metastatic cancer. Moreover, technological advances in next-generation sequencing (NGS) as well as the development of devices for scanning whole-slide bioimages from tissue sections and [[Bioimage informatics|image analysis software]] for quantitation of tumor-infiltrating lymphocytes (TILs) allow, for the first time, the development of personalized cancer immunotherapies that target patient specific mutations. However, there is currently no [[bioinformatics]] solution that supports the integration of these heterogeneous datasets.


'''Results:''' We have developed a bioinformatics platform – Personalized Oncology Suite (POS) – that integrates clinical data, NGS data and whole-slide bioimages from tissue sections. POS is a web-based platform that is scalable, flexible and expandable. The underlying database is based on a data warehouse schema, which is used to integrate information from different sources. POS stores clinical data, genomic data (SNPs and INDELs identified from NGS analysis), and scanned whole-slide images. It features a genome browser as well as access to several instances of the bioimage management application Bisque. POS provides different visualization techniques and offers sophisticated upload and download possibilities. The modular architecture of POS allows the community to easily modify and extend the application.
'''Results:''' We have developed a bioinformatics platform – Personalized Oncology Suite (POS) – that integrates clinical data, NGS data and whole-slide bioimages from tissue sections. POS is a web-based platform that is scalable, flexible and expandable. The underlying database is based on a data warehouse schema, which is used to integrate [[information]] from different sources. POS stores clinical data, [[Genomics|genomic]] data (SNPs and INDELs identified from NGS analysis), and scanned whole-slide images. It features a genome browser as well as access to several instances of the bioimage management application Bisque. POS provides different visualization techniques and offers sophisticated upload and download possibilities. The modular architecture of POS allows the community to easily modify and extend the application.


'''Conclusions:''' The web-based integration of clinical, NGS, and imaging data represents a valuable resource for clinical researchers and future application in medical oncology. POS can be used not only in the context of cancer immunology but also in other studies in which NGS data and images of tissue sections are generated. The application is open-source and can be downloaded at http://www.icbi.at/POS.
'''Conclusions:''' The web-based integration of clinical, NGS, and imaging data represents a valuable resource for clinical researchers and future application in medical oncology. POS can be used not only in the context of cancer immunology but also in other studies in which NGS data and images of tissue sections are generated. The application is open-source and can be downloaded at http://www.icbi.at/POS.
Line 166: Line 162:
<blockquote>'''Figure 1: Software architecture of POS.''' The JBoss Application Server of POS is shown as the central rectangle containing different JSF libraries.<br/>On the right hand side the attached Authorization and Authentication System (AAS) is depicted. POS uses PostgreSQL as database management<br />system and applies EclipseLink for the object-relational mapping. The Bioconductor package Gviz handles rendering of the genome browser<br />tracks, and the R package Rserver provides an R server available through a network connection. On top, distributed Bisque instances are shown.<br />The POS Image Uploader, a JavaFX based standalone application, is outlined on the top left in the figure. This application enables users<br />to upload several images at once.</blockquote>
<blockquote>'''Figure 1: Software architecture of POS.''' The JBoss Application Server of POS is shown as the central rectangle containing different JSF libraries.<br/>On the right hand side the attached Authorization and Authentication System (AAS) is depicted. POS uses PostgreSQL as database management<br />system and applies EclipseLink for the object-relational mapping. The Bioconductor package Gviz handles rendering of the genome browser<br />tracks, and the R package Rserver provides an R server available through a network connection. On top, distributed Bisque instances are shown.<br />The POS Image Uploader, a JavaFX based standalone application, is outlined on the top left in the figure. This application enables users<br />to upload several images at once.</blockquote>


The underlying database is based on a data warehouse schema. This schema was chosen as it is widely used for integrating information from different sources. All defined entities are outlined in Additional file 1. POS uses the Java library EclipseLink for object-relational mapping. In the default configuration POS runs with PostgreSQL, but can be easily exchanged with another relational database.  
The underlying database is based on a data warehouse schema. This schema was chosen as it is widely used for integrating information from different sources. All defined entities are outlined in Additional file 1. POS uses the Java library EclipseLink for object-relational mapping. In the default configuration POS runs with [[PostgreSQL]], but can be easily exchanged with another relational database.  


{|  
{|  
Line 202: Line 198:
As visualization of identified mutations tremendously supports the interpretation of these results, genome browsers have been developed to display mutations in the context of a reference genome.<ref name="PabingerASurv14" /> POS includes a genome browser that features a combined view of different tracks, each containing a dedicated plot. Figure 3 shows a screenshot of the genome browser view within POS. On top of the genome browser panel, the region of interest can be specified by defining chromosome number, start and end position or by choosing a gene of interest. The user can select several patients at once and mutations for each patient will be displayed in separate tracks.  
As visualization of identified mutations tremendously supports the interpretation of these results, genome browsers have been developed to display mutations in the context of a reference genome.<ref name="PabingerASurv14" /> POS includes a genome browser that features a combined view of different tracks, each containing a dedicated plot. Figure 3 shows a screenshot of the genome browser view within POS. On top of the genome browser panel, the region of interest can be specified by defining chromosome number, start and end position or by choosing a gene of interest. The user can select several patients at once and mutations for each patient will be displayed in separate tracks.  


[[File:Fig3 Dander BMCBioinformatics2014 15.jpg|700px]]
[[File:Fig3 Dander BMCBioinformatics2014 15.jpg|800px]]
{{clear}}
{{clear}}
<blockquote>'''Figure 3: View of the genome browser.''' A genome browser visualizing mutations in the context of the reference genome including publicly available annotations is shown. The first track depicts the ideogram of the chosen chromosome, followed by an axis showing its genomic coordinates. Next, publicly available annotations derived from BioMart<ref name="KasprzykBio11">{{cite journal |title=BioMart: Driving a paradigm change in biological data management |journal=Database |author=Kasprzyk, A. |volume=2011 |pages=bar049 |year=2011 |doi=10.1093/database/bar049 |pmid=22083790 |pmc=PMC3215098}}</ref> are depicted. The bottom track holds information about uploaded mutations. The patient name and the shown mutations were randomly generated.</blockquote>
<blockquote>'''Figure 3: View of the genome browser.''' A genome browser visualizing mutations in the context of the reference genome including publicly available<br />annotations is shown. The first track depicts the ideogram of the chosen chromosome, followed by an axis showing its genomic coordinates. Next,<br />publicly available annotations derived from BioMart<ref name="KasprzykBio11">{{cite journal |title=BioMart: Driving a paradigm change in biological data management |journal=Database |author=Kasprzyk, A. |volume=2011 |pages=bar049 |year=2011 |doi=10.1093/database/bar049 |pmid=22083790 |pmc=PMC3215098}}</ref> are depicted. The bottom track holds information about uploaded mutations. The patient<br />name and the shown mutations were randomly generated.</blockquote>
 
===Bioimages===
Whole-slide bioimages can be used to estimate the density of TILs. However, due to the size of the files the required disk space can be very large. Therefore, the raw image files are not stored in POS. In order to integrate whole-slide images within POS, the application connects to several instances of the bioimage management application Bisque<ref name="KvilekvalBis10" /> as outlined in Figure 1. POS supports the following formats: 1) SVS format from Aperio, 2) VSI format from Olympus, 3) Hamamatsu formats NDPI and VMS, 4) TIFF format from Trestle, 5) Leica’s SCN format, 6) formats ZVI and CZI from Zeiss, and 7) OME.TIFF format from the Open Microscopy Environment OME.<ref name="GoodeOpen13">{{cite journal |title=OpenSlide: A vendor-neutral software foundation for digital pathology |journal=Journal of Pathology Informatics |author=Goode, A.; Gilbert, B.; Harkes, J.; Jukic, D.; Satyanarayanan, M. |volume=4 |pages=27 |year=2013 |doi=10.4103/2153-3539.119005 |pmid=24244884 |pmc=PMC3815078}}</ref><ref name="LinkertMeta10">{{cite journal |title=Metadata matters: access to image data in the real world |journal=Journal of Cell Biology |author=Linkert, M.; Rueden, C.T.; Allan, C.; Burel, J-M.; Moore, W.; Patterson, A.; Loranger, B.; Moore, J.; Neves, C.; MacDonald, D.; Tarkowska, A.; Sticco, C.; Hill, E.; Rossner, M.; Eliceiri, K.W.; Swedlow, J.R. |volume=189 |issue=5 |pages=777-82 |year=2010 |doi=10.1083/jcb.201004104 |pmid=20513764 |pmc=PMC2878938}}</ref>
 
===Image upload===
POS supports different ways for uploading images. The first option is a direct upload module, where a patient needs to be selected before the upload starts. The second possibility allows access to already uploaded images within a connected Bisque instance and their assignment to selected patients in POS. The third option is an external upload application supporting the batch upload of multiple images at once. This application is also able to assign the correct relations between uploaded images and patients within POS.
 
===Bisque connection===
The connection to Bisque instances is administered by POS, which handles correct URL, port, username and password settings. In addition, tailored Java servlets for managing the encrypted communication between POS and Bisque have been developed. Downscaling and tiling of the images is performed by the connected Bisque systems. POS is able to display the created tiles in a Google maps like manner, using an adapted version of the interactive JavaScript widget PanoJS3. Hence, only tiles of the currently displayed region are fetched from Bisque. A screenshot of the POS image viewer is displayed in Figure 4.
 
[[File:Fig4 Dander BMCBioinformatics2014 15.jpg|800px]]
{{clear}}
<blockquote>'''Figure 4: Image viewer.''' This Figure depicts a screenshot of an example of a whole-slide bioimage. The image is stored at an external Bisque instance<br />and only tiles that correspond to the current image section are transferred to POS. General information about the patient and the image are depicted<br />on the right hand side. Various attributes like width, height, resolution, and information about the scanner are directly fetched from Bisque. Shown clinical<br />data were randomly generated.</blockquote>
 
===Flexible batch upload module===
As the JSF based image upload form of POS does not support batch upload of several images at once, the standalone application POS Image Uploader has been developed. This application is based on JavaFX and can be started on all systems where Java is installed. Figure 1 depicts this standalone application at the top. The POS Image Uploader is able to acquire the URL and corresponding login information of Bisque instances directly from POS. During the batch upload, each image is uploaded to a selected Bisque instance. Bisque returns identifiers for each uploaded image, which are stored within the POS database and linked to related patients. The communication between the standalone tool and the web-based application is managed by tailored web services.
 
===Authorization and authentication system===
As POS holds confidential and patient related data, the application needs to be secured. POS is secured by an authorization and authentication system (AAS) where participating users and institutes can be configured within this easy to use user management system.<ref name="MaurerMARS05">{{cite journal |title=MARS: microarray analysis, retrieval, and storage system |journal=BMC Bioinformatics |author=Maurer, M.; Molidor, R.; Sturn, A.; Hartler, J.; Hackl, H.; Stocker, G.; Prokesch, A.; Scheideler, M.; Trajanoski, Z. |volume=6 |pages=101 |year=2005 |doi=10.1186/1471-2105-6-101 |pmid=15836795 |pmc=PMC1090551}}</ref> The backend of the used AAS can either be a simple XML file or a web-based application running on a dedicated JBoss Application Server (see Figure 1). For tailored user access, POS applies different user roles, provided by the AAS. Access roles are defined for clinical, NGS, and imaging data as well as for administration. In order to provide custom access profiles, users can be assigned to several roles at once. Furthermore, all web services, used by the POS Image Uploader, authenticate each single connection via the AAS.
 
Finally, POS users can also share their data with other participating users, with the possibility to restrict access to certain data types. It should be noted that as access to patient related data needs to be secured, the implemented sharing concept and the AAS do not support public access to any data type. Rather, users can specify the type of data that can be shared with other users including images, genetic data, or clinical data. If an external user wants access to this integrated data a request must be performed and all participating institutes will be informed about this request.
 
===Installation===
As POS is a web-based application, it can be accessed from any operating system. The only requirements are an up to date web browser and a network connection to the POS server. The server itself is able to run on any operating system with installed Java and R, and has been extensively tested on Linux and Windows machines. The database backend is interchangeable - POS has been tested with the database systems PostgreSQL and MySQL. A detailed installation guide for POS is available at the project home page, accessible at http://www.icbi.at/POS. As POS makes use of several Bisque installations, an installation guide for Bisque is provided at the same location.
 
==Conclusions==
POS is a web-based application combining clinical and biomolecular data including NGS data and bioimages. Due to its modular and flexible architecture it can be easily extended and adapted to different requirements. POS provides an intuitive user interface supporting data upload, manipulation and visualization of integrated data types. In summary, POS represents an effective solution for current challenges in clinical cancer research. The suite can be used not only in the context of personalized cancer immunotherapy but also in other studies where NGS data and images of tissue sections are generated.
 
==Availability and requirements==
'''Project name:''' Personalized Oncology Suite (POS)
 
'''Project home page:''' http://www.icbi.at/POS
 
'''Operating system:''' Platform independent
 
'''Programming language:''' Java, R/Bioconductor
 
'''License:''' GNU AGPL
 
'''Any restrictions to use by non-academics:''' No
 
==Abbreviations==
 
AAS: Authorization and Authentication System
 
AGPL: Affero General Public License
 
COSMIC: Catalogue of Somatic Mutations in Cancer
 
CRUD: Create, Read, Update and Delete
 
INDEL: Small insertion or deletion
 
JSF: JavaServer Faces
 
POS: Personalized Oncology Suite
 
SNP: Single-nucleotide polymorphism
 
TIL: Tumor-infiltrating lymphocyte
 
TNM: Classification of Malignant Tumors
 
URL: Uniform Resource Locator
 
VCF: Variant Call Format
 
==Competing interests==
The authors declare that they have no competing interests.
 
==Authors’ contributions==
AD is the principle developer of POS. MB, MS and BH developed parts of the application. SP as well as ZT reviewed and tested POS and provided suggestions for additional features. All authors participated in writing of the manuscript, read and approved the final manuscript.
 
==Acknowledgements==
This work was supported by ONCOTYROL (FFG, Austrian Federal Ministries BMVIT/BMWFJ) and a grant from the Standortagentur Tirol.


==References==
==References==

Revision as of 20:44, 2 December 2015

Full article title Personalized Oncology Suite: Integrating next-generation sequencing data and whole-slide bioimages
Journal BMC Bioinformatics
Author(s) Dander, Andreas; Baldauf, Matthias; Sperk, Michael; Pabinger, Stephan; Hiltpolt, Benjamin; Trajanoski, Zlatko
Author affiliation(s) Innsbruck Medical University, AIT-Austrian Institute of Technology, Oncotyrol GmbH
Primary contact Email: zlatko.trajanoski@i-med.ac.at
Year published 2014
Volume and issue 15
Page(s) 306
DOI 10.1186/1471-2105-15-306
ISSN 1471-2105
Distribution license Creative Commons Attribution 4.0 International
Website http://www.biomedcentral.com/1471-2105/15/306
Download http://www.biomedcentral.com/content/pdf/1471-2105-15-306.pdf (PDF)

Abstract

Background: Cancer immunotherapy has recently entered a remarkable renaissance phase with the approval of several agents for treatment. Cancer treatment platforms have demonstrated profound tumor regressions including complete cure in patients with metastatic cancer. Moreover, technological advances in next-generation sequencing (NGS) as well as the development of devices for scanning whole-slide bioimages from tissue sections and image analysis software for quantitation of tumor-infiltrating lymphocytes (TILs) allow, for the first time, the development of personalized cancer immunotherapies that target patient specific mutations. However, there is currently no bioinformatics solution that supports the integration of these heterogeneous datasets.

Results: We have developed a bioinformatics platform – Personalized Oncology Suite (POS) – that integrates clinical data, NGS data and whole-slide bioimages from tissue sections. POS is a web-based platform that is scalable, flexible and expandable. The underlying database is based on a data warehouse schema, which is used to integrate information from different sources. POS stores clinical data, genomic data (SNPs and INDELs identified from NGS analysis), and scanned whole-slide images. It features a genome browser as well as access to several instances of the bioimage management application Bisque. POS provides different visualization techniques and offers sophisticated upload and download possibilities. The modular architecture of POS allows the community to easily modify and extend the application.

Conclusions: The web-based integration of clinical, NGS, and imaging data represents a valuable resource for clinical researchers and future application in medical oncology. POS can be used not only in the context of cancer immunology but also in other studies in which NGS data and images of tissue sections are generated. The application is open-source and can be downloaded at http://www.icbi.at/POS.

Keywords: Personalized oncology; Data integration; Next-generation sequencing; Whole-slide bioimaging; Application; Open-source

Background

Cancer immunotherapy has recently entered a remarkable renaissance phase with the approval of several agents for treatment.[1] Cancer immunotherapies that involve the use of the adaptive immune system, such as anti-checkpoint antibodies and adoptive T-cell therapies, have demonstrated profound tumor regressions including complete cure in patients with metastatic cancer. Technological advances in next-generation sequencing (NGS) as well as the development of devices for scanning whole-slide images from tissue sections and image analysis software for quantitation of tumor-infiltrating lymphocytes (TILs) allow, for the first time, the development of personalized cancer immunotherapies that target patient specific mutations. The use of NGS technologies to characterize tumor samples enables one not only to comprehensively study the interactions between human cancers and the immune system, but also to identify targets for patient stratification. Moreover, the quantitation of TILs will improve therapeutic efficacy, even in the absence of immunotherapy. It will enable a precise characterization of the immune infiltrates in the tumor and will help to identify mechanisms of tumor regression and disentangle the complex tumor-immune cell interactions. For example, understanding the molecular basis of the interactions between cytotoxic chemotherapeutics or targeted anti-cancer agents and the immune system is essential for the development of optimal therapeutic schemes and in the long run will result in clinical benefit for the patients.

However, the real value of the disparate datasets can be truly exploited only when the data are integrated. In our experience it is of utmost importance to establish a local database hosting only the necessary data. Only pre-processed and normalized data will be stored in a dedicated relational database whereas primary data are archived at separate locations including public repositories.[2] To this end, a database that integrates clinical, NGS, and bioimaging data would be extremely helpful for clinical cancer research and in near future also for routine applications in medical oncology. However, to the best of our knowledge there is currently no application that supports this integration. As of today there are different applications integrating either clinical data and NGS data or bioimages (Table 1) but no integrated solution has been created. We therefore developed the bioinformatics platform Personalized Oncology Suite (POS) to overcome this bottleneck and support the researchers working in this exciting field.

Table 1. This table compares POS with other applications in the context of data integration
Application Edit Clinical TNM Sequencing Vis. Bioimages Public Ref.
BioIMAX Y N N N N Whole-slide N [3]
Bisque Y N N N N Whole-slide N [4]
Galaxy LIMS Y N N Y N N N [5]
NG6 N N N Raw 454 and HiSeq N N N [6]
NGS tools Y N N Raw Illumina N N DAS[7] [8]
OMERO Y N N N N N N [9]
ONCO-i2b2 N Y Y N N N SNOMED [10]
openBIS Y N N Raw Illumina N N N [11]
Taverna N N N Y N N Y [12]
POS Y Y Y vcf Files Y Whole-slide COSMIC[13] -
The columns Clinical and TNM show if these data types are available. Sequencing depicts which type of next-generation sequencing data can be uploaded, and the column Vis. shows if mutations can be visualized. The column Bioimages shows which type of images can be used and the final column Public states available public annotations.

Implementation

In order to be scalable, flexible, and expandable, POS makes use of state of the art software engineering techniques and architectures like the Java Enterprise Edition 6 (J2EE 6) technology stack. It is a web-based platform relying on the JBoss Application Server in version 7.1.1. The modular three-tier architecture (web frontend, application core, database backend) and the release under the open-source license GNU AGPL enables the community to easily modify and extend the application with further functionalities. Figure 1 outlines the software architecture of POS and depicts the main used libraries. PrimeFaces and PrimeFaces Extensions are used for the creation of JSF components, whereas Hibernate Validator provides input validation of user entries. As access scopes of Java Beans are crucial within JavaEE applications, Apache CODI is used to include additional scopes. Due to the fact that POS deals with different types of collections, the Guava libraries were chosen to support POS with a set of helpful functionalities regarding collections. Used Bisque instances are shown at the top of Figure 1. Furthermore, the standalone application POS Image Uploader allows batch uploading of numerous images at once.

Fig1 Dander BMCBioinformatics2014 15.jpg

Figure 1: Software architecture of POS. The JBoss Application Server of POS is shown as the central rectangle containing different JSF libraries.
On the right hand side the attached Authorization and Authentication System (AAS) is depicted. POS uses PostgreSQL as database management
system and applies EclipseLink for the object-relational mapping. The Bioconductor package Gviz handles rendering of the genome browser
tracks, and the R package Rserver provides an R server available through a network connection. On top, distributed Bisque instances are shown.
The POS Image Uploader, a JavaFX based standalone application, is outlined on the top left in the figure. This application enables users
to upload several images at once.

The underlying database is based on a data warehouse schema. This schema was chosen as it is widely used for integrating information from different sources. All defined entities are outlined in Additional file 1. POS uses the Java library EclipseLink for object-relational mapping. In the default configuration POS runs with PostgreSQL, but can be easily exchanged with another relational database.

Additional file 1. Database schema of the Personalized Oncology Suite.
The database schema of POS is based on a data warehouse schema. Therefore, the central entity represents patients. Each patient belongs to an institute, which stores the externalid referencing an institute within the attached Authorization and Authentication System. It can be seen that each institute can be connected to an imagerepository containing information about the connection to a Bisque instance. The entity institutesharing manages information about shared data. Clinicaldata as well as tnm and immunoscore for staging of cancer are related to the entity patient. For the integration of somatic mutations within POS the entities snp and indel are used. The entity analysis contains metadata about the next-generation sequencing itself. Image manages the attributes externalimageid and externalresourceid which are IDs used for accessing the image within Bisque. The attached imagetype contains information about the staining of the image. All entities with a name like <name>_audittrail hold information about documented changes made to the attached entity <name>. It is shown that the timestamp, the name of the author and the performed changes are recorded within these entities. Several entities comprise a deleted flag. If such an entity gets deleted it will not be removed within the database, but will not be shown in the frontend. This has the advantage that deleted entities can be restored by a database administrator.

Format: PNG; Size: 1.5MB Download file

Results and discussion

The Personalized Oncology Suite is an application combining biological and clinical data into one integrated solution. In this context, clinical data comprises information about cancer patients, TNM staging, and density values of TILs used for immune score estimation. Biological data describes mutations found via next-generation sequencing and whole-slide bioimages, which can be uploaded to the application. POS features different types of visualization techniques for all integrated data types. Furthermore, publicly available data from the COSMIC database[13] is integrated into POS. As data types are stored in different file formats, POS includes several data import and export possibilities for the most important formats. In addition, different filters on the data can be applied either individually or in combination (e.g. age at diagnosis and TNM stage). In the current implementation queries can be done only for single modalities (e.g. genetic features, images, or clinical parameters). Furthermore, patients can be selected based on their UICC stage or Immunoscore by using a range slider. POS features different user interface languages using internationalization and is fault tolerant as all inputs are validated. Furthermore, POS implements exception handling as well as an intelligent logging functionality.

POS has been released under the open-source license GNU AGPL to allow integration of new components from the scientific community. The source code can be accessed via the project website http://www.icbi.at/POS. Figure 2 outlines the different layers between the JSF based frontend and the relational database.

Fig2 Dander BMCBioinformatics2014 15.jpg

Figure 2: Software layers of POS. a) The JSF based presentation layer on top (blue) access the relational database (green) via the Java backend. This backend
can be accessed via the classes Controller or LazyDataModel. These classes make use of the DataAccessObject which provides a single access point to the
relational database by using object-relational mapping. b) The Controller is responsible for secure CRUD (create, read, update and delete) operations.
c) The LazyDataModel is used for providing immutable, filtered, sorted, and paginated collections to the presentation layer.

Clinical data and tumor staging

POS integrates clinical data of cancer patients including various attributes such as gender, date of diagnosis, disease duration, adjuvant therapy, and relapse. We have included compliant measures in the design of the software ensuring that patient-identifying data such as name, academic title, address, telephone number, e-mail address, date and place of birth, as well as date and place of death[14] are not collected. Patients are uniquely defined by alphanumerical identifiers. Thus, data from multiple clinical visits or any other information can be unambiguously assigned to an individual patient.

Tumor staging

In addition to clinical data, POS integrates the TNM cancer staging system. The T, N, and M categories are stored separately within the database and the resulting AJCC/UICC stage (0-IV) is determined from this information. POS facilitates the creation of descriptive plots for comparing patients in different TNM stages. These plots can also be used for comparison of data among different participating institutes, provided the user has permission to access the information. In addition to input data manually, clinical and staging data can be uploaded to POS in CSV format. Furthermore, these data types can be exported as XLS, CSV and PDF files.

Next-generation sequencing data

With the use of next-generation sequencing, biologists are able to determine the order of nucleotide bases composing the DNA. The identification of somatic mutations can be performed by sequencing tumor/normal pairs and subsequently comparing cancerous to healthy tissue. Several different applications exist which are able to analyze tumor/normal pairs regarding their somatic SNPs and INDELs.[15] Since there are a plethora of methods for analyses of NGS data, the design of the software focused on the integration of disparate data types rather than the analyses of raw data. Furthermore, as the methods and available tools improve at fast pace, integration of processed data makes the system more flexible and versatile. POS is able to integrate somatic SNPs and INDELs identified by such experiments. For uploading this type of data, users first need to define an analysis, which can be used to attach called mutations. POS supports data upload via VCF (Variant Call Format) files or by manual input of mutations.

Genome browser

As visualization of identified mutations tremendously supports the interpretation of these results, genome browsers have been developed to display mutations in the context of a reference genome.[15] POS includes a genome browser that features a combined view of different tracks, each containing a dedicated plot. Figure 3 shows a screenshot of the genome browser view within POS. On top of the genome browser panel, the region of interest can be specified by defining chromosome number, start and end position or by choosing a gene of interest. The user can select several patients at once and mutations for each patient will be displayed in separate tracks.

Fig3 Dander BMCBioinformatics2014 15.jpg

Figure 3: View of the genome browser. A genome browser visualizing mutations in the context of the reference genome including publicly available
annotations is shown. The first track depicts the ideogram of the chosen chromosome, followed by an axis showing its genomic coordinates. Next,
publicly available annotations derived from BioMart[16] are depicted. The bottom track holds information about uploaded mutations. The patient
name and the shown mutations were randomly generated.

Bioimages

Whole-slide bioimages can be used to estimate the density of TILs. However, due to the size of the files the required disk space can be very large. Therefore, the raw image files are not stored in POS. In order to integrate whole-slide images within POS, the application connects to several instances of the bioimage management application Bisque[4] as outlined in Figure 1. POS supports the following formats: 1) SVS format from Aperio, 2) VSI format from Olympus, 3) Hamamatsu formats NDPI and VMS, 4) TIFF format from Trestle, 5) Leica’s SCN format, 6) formats ZVI and CZI from Zeiss, and 7) OME.TIFF format from the Open Microscopy Environment OME.[17][18]

Image upload

POS supports different ways for uploading images. The first option is a direct upload module, where a patient needs to be selected before the upload starts. The second possibility allows access to already uploaded images within a connected Bisque instance and their assignment to selected patients in POS. The third option is an external upload application supporting the batch upload of multiple images at once. This application is also able to assign the correct relations between uploaded images and patients within POS.

Bisque connection

The connection to Bisque instances is administered by POS, which handles correct URL, port, username and password settings. In addition, tailored Java servlets for managing the encrypted communication between POS and Bisque have been developed. Downscaling and tiling of the images is performed by the connected Bisque systems. POS is able to display the created tiles in a Google maps like manner, using an adapted version of the interactive JavaScript widget PanoJS3. Hence, only tiles of the currently displayed region are fetched from Bisque. A screenshot of the POS image viewer is displayed in Figure 4.

Fig4 Dander BMCBioinformatics2014 15.jpg

Figure 4: Image viewer. This Figure depicts a screenshot of an example of a whole-slide bioimage. The image is stored at an external Bisque instance
and only tiles that correspond to the current image section are transferred to POS. General information about the patient and the image are depicted
on the right hand side. Various attributes like width, height, resolution, and information about the scanner are directly fetched from Bisque. Shown clinical
data were randomly generated.

Flexible batch upload module

As the JSF based image upload form of POS does not support batch upload of several images at once, the standalone application POS Image Uploader has been developed. This application is based on JavaFX and can be started on all systems where Java is installed. Figure 1 depicts this standalone application at the top. The POS Image Uploader is able to acquire the URL and corresponding login information of Bisque instances directly from POS. During the batch upload, each image is uploaded to a selected Bisque instance. Bisque returns identifiers for each uploaded image, which are stored within the POS database and linked to related patients. The communication between the standalone tool and the web-based application is managed by tailored web services.

Authorization and authentication system

As POS holds confidential and patient related data, the application needs to be secured. POS is secured by an authorization and authentication system (AAS) where participating users and institutes can be configured within this easy to use user management system.[19] The backend of the used AAS can either be a simple XML file or a web-based application running on a dedicated JBoss Application Server (see Figure 1). For tailored user access, POS applies different user roles, provided by the AAS. Access roles are defined for clinical, NGS, and imaging data as well as for administration. In order to provide custom access profiles, users can be assigned to several roles at once. Furthermore, all web services, used by the POS Image Uploader, authenticate each single connection via the AAS.

Finally, POS users can also share their data with other participating users, with the possibility to restrict access to certain data types. It should be noted that as access to patient related data needs to be secured, the implemented sharing concept and the AAS do not support public access to any data type. Rather, users can specify the type of data that can be shared with other users including images, genetic data, or clinical data. If an external user wants access to this integrated data a request must be performed and all participating institutes will be informed about this request.

Installation

As POS is a web-based application, it can be accessed from any operating system. The only requirements are an up to date web browser and a network connection to the POS server. The server itself is able to run on any operating system with installed Java and R, and has been extensively tested on Linux and Windows machines. The database backend is interchangeable - POS has been tested with the database systems PostgreSQL and MySQL. A detailed installation guide for POS is available at the project home page, accessible at http://www.icbi.at/POS. As POS makes use of several Bisque installations, an installation guide for Bisque is provided at the same location.

Conclusions

POS is a web-based application combining clinical and biomolecular data including NGS data and bioimages. Due to its modular and flexible architecture it can be easily extended and adapted to different requirements. POS provides an intuitive user interface supporting data upload, manipulation and visualization of integrated data types. In summary, POS represents an effective solution for current challenges in clinical cancer research. The suite can be used not only in the context of personalized cancer immunotherapy but also in other studies where NGS data and images of tissue sections are generated.

Availability and requirements

Project name: Personalized Oncology Suite (POS)

Project home page: http://www.icbi.at/POS

Operating system: Platform independent

Programming language: Java, R/Bioconductor

License: GNU AGPL

Any restrictions to use by non-academics: No

Abbreviations

AAS: Authorization and Authentication System

AGPL: Affero General Public License

COSMIC: Catalogue of Somatic Mutations in Cancer

CRUD: Create, Read, Update and Delete

INDEL: Small insertion or deletion

JSF: JavaServer Faces

POS: Personalized Oncology Suite

SNP: Single-nucleotide polymorphism

TIL: Tumor-infiltrating lymphocyte

TNM: Classification of Malignant Tumors

URL: Uniform Resource Locator

VCF: Variant Call Format

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

AD is the principle developer of POS. MB, MS and BH developed parts of the application. SP as well as ZT reviewed and tested POS and provided suggestions for additional features. All authors participated in writing of the manuscript, read and approved the final manuscript.

Acknowledgements

This work was supported by ONCOTYROL (FFG, Austrian Federal Ministries BMVIT/BMWFJ) and a grant from the Standortagentur Tirol.

References

  1. Couzin-Frankel, J. (2013). "Cancer Immunotherapy". Science 342 (6165): 1432-1433. doi:10.1126/science.342.6165.1432. PMID 24357284. 
  2. Hackl, H.; Stocker, G.; Charoentong, P.; Mlecnik, B.; Bindea, G.; Galon, J.; Trajanoski, Z. (2010). "Information technology solutions for integration of biomolecular and clinical data in the identification of new cancer biomarkers and targets for therapy". Pharmacology & Therapeutics 128 (3): 488–498. doi:10.1016/j.pharmthera.2010.08.012. PMID 20832425. 
  3. Loyek, C.; Rajpoot, N.M.; Khan, M.; Nattkemper, T.W. (2011). "BioIMAX: A Web 2.0 approach for easy exploratory and collaborative access to multivariate bioimage data". BMC Bioinformatics 12: 297. doi:10.1186/1471-2105-12-297. PMC PMC3161928. PMID 21777450. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161928. 
  4. 4.0 4.1 Kvilekval, K.; Fedorov, D.; Obara, B.; Singh, A.; Manjunath, B.S. (2010). "Bisque: A platform for bioimage analysis and management". Bioinformatics 26 (4): 544-52. doi:10.1093/bioinformatics/btp699. PMID 20031971. 
  5. Scholtalbers, J.; Rössler, J.; Sorn, P.; de Graaf, J.; Boisguérin, V.; Castle, J.; Sahin, U. (2013). "Galaxy LIMS for next-generation sequencing". Bioinformatics 29 (9): 1233-1234. doi:10.1093/bioinformatics/btt115. PMID 23479349. 
  6. Mariette, J.; Escudié, F.; Allias, N.; Salin, G.; Noirot, C.; Thomas, S.; Klopp, C. (2012). "NG6: Integrated next generation sequencing storage and processing environment". BMC Genomics 13: 462. doi:10.1186/1471-2164-13-462. PMC PMC3444930. PMID 22958229. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444930. 
  7. Jenkinson, A.M.; Albrecht, M.; Birney, E.; Blankenburg, H.; Down, T.; Finn, R.D.; Hermjakob, H.; Hubbard, T.J.P.; Jimenez, R.C.; Jones, P.; Kähäri, A.; Kulesha, E.; Macías, J.R.; Reeves, G.A.; Prlić, A. (2008). "Integrating biological data-the Distributed Annotation System". BMC Bioinformatics 9 (Suppl 8): S3. doi:10.1186/1471-2105-9-S8-S3. 
  8. Nix, D.A.; Di Sera, T.L.; Dalley, B.K.; Milash, B.A.; Cundick, R.M.; Quinn, K.S.; Courdy, S.J. (2010). "Next generation tools for genomic data generation, distribution, and visualization". BMC Bioinformatics 11: 455. doi:10.1186/1471-2105-11-455. PMC PMC2944281. PMID 20828407. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2944281. 
  9. Allan, C.; Burel, J-M.; Moore, J.; Blackburn, C.; Linkert, M.; Loynton, S.; MacDonald, D.; Moore, W.J.; Neves, C.; Patterson, A.; Porter, M.; Tarkowska, A.; Loranger, B.; Avondo, J.; Lagerstedt, I.; Lianas, L.; Leo, S.; Hands, K.; Hay, R.T.; Patwardhan, A.; Best, C.; Kleywegt, G.J.; Zanetti, G.; Swedlow, J.R. (2012). "OMERO: flexible, model-driven data management for experimental biology". Nature Methods 9 (3): 245-53. doi:10.1038/nmeth.1896. PMC PMC3437820. PMID 22373911. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3437820. 
  10. Segagni, D.; Tibollo, V.; Dagliati, A.; Zambelli, A.; Priori, S.G.; Bellazzi, R. (2012). "An ICT infrastructure to integrate clinical and molecular data in oncology research". BMC Bioinformatics 13: 4. doi:10.1186/1471-2105-13-4. PMC PMC3280155. PMID 22226192. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280155. 
  11. Bauch, A.; Adamczyk, I.; Buczek, P.; Elmer, F-J.; Enimanev, K.; Glyzewski, P.; Kohler, M.; Pylak, T.; Quandt, A.; Ramakrishnan, C.; Beisel, C.; Malmström, L.; Aebersold, R.; Rinn, B. (2011). "openBIS: A flexible framework for managing and analyzing complex data in biology research". BMC Bioinformatics 12: 468. doi:10.1186/1471-2105-12-468. PMC PMC3275639. PMID 22151573. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3275639. 
  12. Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; Soiland-Reyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M.P.; Sufi, S.; Goble, C. (2013). "The Taverna workflow suite: Designing and executing workflows of Web Services on the desktop, web or in the cloud". Nucleic Acids Research 41 (W1): W557-W561. doi:10.1093/nar/gkt328. PMC PMC3692062. PMID 23640334. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692062. 
  13. 13.0 13.1 Forbes, S.A.; Bindal, N.; Bamford, S.; Cole, C.; Kok, C.Y.; Beare, D.; Jia, M.; Shepherd, R.; Leung, K.; Menzies, A.; Teague, J.W.; Campbell, P.J.; Stratton, M.R.; Futreal, P.A. (2011). "COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer". Nucleic Acids Research 39 (Suppl 1): D945-D950. doi:10.1093/nar/gkq929. PMC PMC3013785. PMID 20952405. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013785. 
  14. Schütze, B. (2012). "Use of medical treatment data outside of the patient supply – best way pseudonymisation". Deutsche Medizinische Wochenschrift 137 (16): 844-50. doi:10.1055/s-0031-1299040. PMID 22495919. 
  15. 15.0 15.1 Pabinger, S.; Dander, A.; Fischer, M.; Snajder, R.; Sperk, M.; Efremova, M.; Krabichler, B.; Speicher, M.R.; Zschocke, J.; Trajanoski, Z. (2014). "A survey of tools for variant analysis of next-generation genome sequencing data". Briefings in Bioinformatics 15 (2): 256-78. doi:10.1093/bib/bbs086. PMC PMC3956068. PMID 23341494. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3956068. 
  16. Kasprzyk, A. (2011). "BioMart: Driving a paradigm change in biological data management". Database 2011: bar049. doi:10.1093/database/bar049. PMC PMC3215098. PMID 22083790. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3215098. 
  17. Goode, A.; Gilbert, B.; Harkes, J.; Jukic, D.; Satyanarayanan, M. (2013). "OpenSlide: A vendor-neutral software foundation for digital pathology". Journal of Pathology Informatics 4: 27. doi:10.4103/2153-3539.119005. PMC PMC3815078. PMID 24244884. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3815078. 
  18. Linkert, M.; Rueden, C.T.; Allan, C.; Burel, J-M.; Moore, W.; Patterson, A.; Loranger, B.; Moore, J.; Neves, C.; MacDonald, D.; Tarkowska, A.; Sticco, C.; Hill, E.; Rossner, M.; Eliceiri, K.W.; Swedlow, J.R. (2010). "Metadata matters: access to image data in the real world". Journal of Cell Biology 189 (5): 777-82. doi:10.1083/jcb.201004104. PMC PMC2878938. PMID 20513764. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2878938. 
  19. Maurer, M.; Molidor, R.; Sturn, A.; Hartler, J.; Hackl, H.; Stocker, G.; Prokesch, A.; Scheideler, M.; Trajanoski, Z. (2005). "MARS: microarray analysis, retrieval, and storage system". BMC Bioinformatics 6: 101. doi:10.1186/1471-2105-6-101. PMC PMC1090551. PMID 15836795. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090551. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Table 1 has also been modified to a Y/N format from its original check mark/X format.