Journal:Requirements for data integration platforms in biomedical research networks: A reference model

Full article title	Requirements for data integration platforms in biomedical research networks: A reference model
Journal	PeerJ Computer Science
Author(s)	Ganzinger, Matthias; Knaup, Petra
Author affiliation(s)	Heidelberg University
Primary contact	Email: matthias.ganzinger@med.uni-heidelberg.de
Editors	Juan, Hsueh-Fen
Year published	2015
Volume and issue	3
Page(s)	e755
DOI	10.7717/peerj.755
ISSN	2376-5992
Distribution license	Creative Commons Attribution 4.0 International
Website	https://peerj.com/articles/755/
Download	https://peerj.com/articles/755.pdf (PDF)

This article should not be considered complete until this message box has been removed. This is a work in progress.

Abstract

Biomedical research networks need to integrate research data among their members and with external partners. To support such data sharing activities, an adequate information technology infrastructure is necessary. To facilitate the establishment of such an infrastructure, we developed a reference model for the requirements. The reference model consists of five reference goals and 15 reference requirements. Using the Unified Modeling Language, the goals and requirements are set into relation to each other. In addition, all goals and requirements are described textually in tables. This reference model can be used by research networks as a basis for a resource efficient acquisition of their project specific requirements. Furthermore, a concrete instance of the reference model is described for a research network on liver cancer. The reference model is transferred into a requirements model of the specific network. Based on this concrete requirements model, a service-oriented information technology architecture is derived and also described in this paper.

Keywords: Research network, Reference model, Data integration, Biomedical informatics, Service-oriented architecture

Introduction

Current biomedical research is supported by modern biotechnological methods producing vast amounts of data (Frey, Maojo & Mitchell, 2007^[1]; Baker, 2010^[2]). In order to get a comprehensive picture of the physiology and pathogenic processes of diseases, many facets of biological mechanisms need to be examined. Contemporary research, e.g., investigating cancer, is a complex endeavor that can be conducted most successfully when researchers of multiple disciplines cooperate and draw conclusions from comprehensive scientific data sets (Welsh, Jirotka & Gavaghan, 2006^[3]; Mathew et al., 2007^[4]). As a frequent measure to support cooperation, research networks sharing common resources are established.

To generate added value from such a network, all available scientific and clinical data should be combined to facilitate a new, comprehensive perspective. This requires provision of adequate information technology (IT) which is a challenge on all levels of biomedical research. For example, it is inevitable for research networks to use an IT infrastructure for sharing data and findings in order to leverage joint analyses. Data generated by biotechnological devices can only be evaluated thoroughly by applying biostatistical methods with IT tools.

However, data structures are often heterogeneous, resulting in the need for a data integration process. This process involves the harmonization of data structures by defining appropriate metadata (Cimino, 1998^[5]). Depending on the specific needs and data structures of the research network, often a non-standard IT platform needs to be developed to meet the specific requirements. An important requirement might be the protection of data in terms of security and privacy, especially when patient data are involved.

In the German research network SFB/TRR77 — Liver Cancer: From Molecular Pathogenesis to Targeted Therapies it was our task to explore the most appropriate IT-architecture for supporting networked research (Woll, Manns & Schirmacher, 2013^[6]). The research network consists of 22 projects sharing common resources and research data. To provide this network with a data integration platform we implemented a service-oriented architecture (SOA) (Taylor et al., 2004^[7]; Papazoglou et al., 2008^[8]; Wei & Blake, 2010^[9]; Bosin, Dessì & Pes, 2011^[10]). The IT system is based on the cancer Common Ontologic Representation Environment Software Development Kit (caCORE SDK) components of the cancer Biomedical Informatics Grid (caBIG) (Komatsoulis et al., 2008^[11]; Kunz, Lin & Frey, 2009^[12]). The resulting system is called pelican (platform enabling liver cancer networked research) (Ganzinger et al., 2011^[13]). Transfer of these data sharing concepts to other networks investigating different disease areas is possible.

We consider our research network as a typical example for a whole class of biomedical research networks. To support this kind of projects, we provide a framework for the development of data integration platforms for such projects. Specifically, we strive for the following two objectives:

Objective 1: Provide a reference model of requirements of biomedical research networks regarding an IT platform for sharing and analyzing data.

Objective 2: Design a SOA of an IT platform for our research network on liver cancer. It should implement the reference model for requirements. While this SOA is specific to this project, parts can be reused for similar projects.

References

↑ Frey, L.J.; Maojo, V.; Mitchell, J.A. (2007). "Bioinformatics linkage of heterogeneous clinical and genomic information in support of personalized medicine". IMIA Yearbook of Medical Informatics 2007: 98-105. ISSN 0943-4747. PMID 17700912.
↑ Baker, M. (2010). "Next-generation sequencing: adjusting to data overload". Nature Methods 7 (7): 495-499. doi:10.1038/nmeth0710-495.
↑ Welsh, E.; Jirotka, M.; Gavaghan, D. (2006). "Post-genomic science: cross-disciplinary and large-scale collaborative research and its organizational and technological challenges for the scientific research process". Philosophical Transactions of the Royal Society A 364 (1843): 1533-1549. doi:10.1098/rsta.2006.1785. PMID 16766359.
↑ Mathew, J.P.; Taylor, B.S.; Bader, G.D.; Pyarajan, S.; Antoniotti, M.; Chinnaiyan, A.M.; Sander, C.; Burakoff, S.J.; Mishra, B. (2007). "From bytes to bedside: data integration and computational biology for translational cancer research". PLOS Computational Biology 3 (2): e12. doi:10.1371/journal.pcbi.0030012. PMC PMC1808026. PMID 17319736. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1808026.
↑ Cimino, J.J. (1998). "Desiderata for controlled medical vocabularies in the twenty-first century". Methods of Information in Medicine 37 (4–5): 394-403. PMC PMC3415631. PMID 9865037. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3415631.
↑ Woll, K.; Manns, M.; Schirmacher, P. (2013). "Sonderforschungsbereich SFB/TRR77: Leberkrebs: Von der molekularen Pathogenese zur zielgerichteten Therapie". Der Pathologe 34 (2): 232-234. doi:10.1007/s00292-013-1820-z.
↑ Taylor, K.L.; O’Keefe, C.M.; Colton, J.; Baxter, R.; Sparks, R.; Srinivasan, U.; Cameron, M.A.; Lefort, L. (2004). "A service oriented architecture for a health research data network". Proceedings of the 16th International Conference on Scientific and Statistical Database Management 2004: 443-444. doi:10.1109/SSDM.2004.1311251.
↑ Papazoglou, M.P.; Traverso, P.; Dustdar, S.; Leymann, F. (2008). "Service-oriented computing: A research roadmap". International Journal of Cooperative Information Systems 17: 223. doi:10.1142/S0218843008001816.
↑ Wei, Y.; Blake, M.B. (2010). "Service-oriented computing and cloud computing: Challenges and opportunities". IEEE Internet Computing 14 (6): 72–75. doi:10.1109/MIC.2010.147.
↑ Bosin, A.; Dessì, N.; Pes, B. (2011). "Extending the SOA paradigm to e-Science environments". Future Generation Computer Systems 27 (1): 20–31. doi:10.1016/j.future.2010.07.003.
↑ Komatsoulis, G.A.; Warzel, D.B.; Hartel, F.W.; Shanbhag, K.; Chilukuri, R.; Fragoso, G.; de Coronado, S.; Reeves, D.M.; Hadfield, J.B.; Ludet, C.; Covitz, P.A. (2008). "caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability". Journal of Biomedical Informatics 41 (1): 106–123. doi:10.1016/j.jbi.2007.03.009. PMC PMC2254758. PMID 17512259. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2254758.
↑ Kunz, I.; Lin, M.; Frey, L. (2009). "Metadata mapping and reuse in caBIG". BMC Bioinformatics 10 (Suppl 2): S4. doi:10.1186/1471-2105-10-S2-S4. PMC PMC2646244. PMID 19208192. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646244.
↑ Ganzinger, M.; Noack, T.; Diederichs, S.; Longerich, T. Knaup, P. (2011). "Service oriented data integration for a biomedical research network". Studies in Health Technology and Informatics 169: 867-71. doi:10.3233/978-1-60750-806-9-867. PMID 21893870.

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In most of the article's references DOIs and PubMed IDs were not given; they've been added to make the references more useful. In some cases important information was missing from the references, and that information was added.