Journal:Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces

Full article title	Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces
Journal	BioData Mining
Author(s)	Tsur, Elishai Ezra
Author affiliation(s)	Jerusalem College of Technology
Primary contact	Email: elishai85 at gmail dot com
Year published	2017
Volume and issue	10
Page(s)	11
DOI	10.1186/s13040-017-0130-z
ISSN	1756-0381
Distribution license	Creative Commons Attribution 4.0 International
Website	https://biodatamining.biomedcentral.com/articles/10.1186/s13040-017-0130-z
Download	https://biodatamining.biomedcentral.com/track/pdf/10.1186/s13040-017-0130-z (PDF)

This article should not be considered complete until this message box has been removed. This is a work in progress.

Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, object persistence and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Center for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistence agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysm-associated vascular diseases is demonstrated. This database contains three-dimensional geometries of aneurysms, patients' clinical information, articles, biological models, related diseases and our recently published model of aneurysms’ risk of rapture. The framework is available at: http://nbel-lab.com.

Keywords: specialized databases, object-relational databases, EclipseLink, Apache Derby, object-oriented programming

Background

In the last few decades the intersection of computer science and biology has evolved to the point at which answers to fundamental biological questions have emerged.^[1] Some of the most important cross-talks between biology and computer science lie within the data-intensive nature of modern biology.^[2] It is currently evident that fields such as computational biology and bioinformatics are practically fueled by the increasing computational resources available and the development of software encapsulation and abstraction layers.^[3] An important corner stone of the computer-science/biology interface is object-centered reductionism where relations between discrete biological entities such as DNA, protein and RNA are investigated.^[1] Data regarding biological entities is stored in databases, which have become the most important corner stone for research in computational biology and bioinformatics.

References

↑ ^1.0 ^1.1 Kitano, H. (2002). "Computational systems biology". Nature 420 (6912): 206–10. doi:10.1038/nature01254. PMID 12432404.
↑ Stein, L.D. (2003). "Integrating biological databases". Nature Reviews Genetics 4 (5): 337–45. doi:10.1038/nrg1065. PMID 12728276.
↑ Cannata, N.; Merelli, E.; Altman, R.B. (2005). "Time to organize the bioinformatics resourceome". PLOS Computational Biology 1 (7): e76. doi:10.1371/journal.pcbi.0010076. PMC PMC1323464. PMID 16738704. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1323464.

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Grammar and word use were updated to make the text easier to read.