Journal:Development of Biosearch System for biobank management and storage of disease-associated genetic information

Full article title	Development of Biosearch System for biobank management and storage of disease-associated genetic information
Journal	Journal of King Saud University - Science
Author(s)	Karim, Sajjad; Al-Kharraz, Mona; Mirza, Zeenat; Noureldin, Hend; Abusamara, Heba; Alganmi, Nofe; Merdad, Adnan; Jastaniah, Saddig; Kumar, Sudhir; Rasool, Mahmood; Abuzenadah, Adel; Al-Qahtani, Mohammed
Author affiliation(s)	King Abdulaziz University, Temple University
Primary contact	Email: skarim1 at kau dot edu dot sa
Year published	2022
Volume and issue	34(2)
Article #	101760
DOI	10.1016/j.jksus.2021.101760
ISSN	1018-3647
Distribution license	Creative Commons Attribution 4.0 International
Website	https://www.sciencedirect.com/science/article/pii/S1018364721004225
Download	https://www.sciencedirect.com/science/article/pii/S1018364721004225/pdfft (PDF)

Abstract

Objective: Databases and software are important to manage modern high-throughput laboratories and store clinical and genomic information for quality assurance. Commercial software is expensive, with proprietary code issues, while academic versions have adaptation issues. Our aim was to develop an adaptable in-house software system that can store specimen- and disease-associated genetic information in biobanks to facilitate translational research.

Methods: A prototype was designed per the research requirements, and computational tools were used to develop the software under three tiers, using Visual Basic and ASP.net for the presentation tier, SQL Server for the data tier, and Ajax and JavaScript for the business tier. We retrieved specimens from the biobank using this software and performed microarray-based transcriptomic analysis to detect differentialy expressed genes (DEGs) with FC ±2 and P-value <0.05 in triple-negative breast cancer cases. The Ingenuity Pathway Analysis (IPA) tool was used to predict canonical molecular pathways associated with disease. Overall performance and utility of software was evaluated by Apache JMeter software, CRUD function testing, and a set of feedback questioners.

Results: We developed Biosearch System, a web-based laboratory information management system (LIMS) enabling management of biobank samples (e.g., tissue, blood, FTTP slides) and their extracts (e.g., DNA, RNA, and proteins) with clinical and experimental details. The client satisfaction feedback was excellent, with a score of 4.7 out of 5. We identified a total of 1,181 DEGs, including both upregulated (IFI6, LEF1, FANCI, CASC5, PLXNA3, etc.) and down-regulated (ADH1B, LYVE1, ADH1C, ADH1B, ADIPOQ, PLIN1, LYVE1, etc.) genes in triple-negative breast cancer. Pathway analysis of DEGs revealed significant activation of interferon signaling (z-score 2.646) and kinetochore metaphase signaling pathway (z-score 2.138) in cancer.

Conclusion: Biosearch System is a user friendly LIMS for collection, storage, and retrieval of specimen and clinical information. It is secure, efficient, and convenient in sample tracking and data analysis. We illustrated its utility in transcriptomic study of breast cancer. Additionally, it can facilitate and speed up any genomic study and translational research publications.

Keywords: Biosearch System, LIMS, database, biobank, genomics, microarray, bioinformatics

Graphical abstract:

Introduction

Biobanks are established for long-term storage and conservation facility for biological specimens along with their demographic, clinical and experimental information to support scientific investigation using bioinformatics tools (Artene et al., 2013). That investigation, tied to high-throughput technological advancements in next-generation sequencing (NGS) and microarrays, has generated significant amounts of data at relatively low costs. (Diamandis, 2009, Fehniger and Marko-Varga, 2011, Glenn, 2011, Merdad et al., 2014) However, as the amount of biological specimens grows into the thousands, manual methods fail in the efficient handling of specimens and samples, and the risk of information being lost increases. Quality becomes another major concern with increasing volume of big data. As such, a robust software solution becomes necessary to effectively manage biobank and laboratory information and ensure the quality of results with approved ethical guidelines and personal integrity. ((ISBER), 2012, Bredenoord et al., 2011, Kang et al., 2013, Voegele et al., 2007) Software is helpful in managing the workflow cycle—comprising of collection, storage, analysis, and report generation—facilitating quick and easy retrieval of information and speeding up biomarker and therapeutic discoveries. (Melo et al., 2010)

The collection of a significant cohort size with crucial factors like number and type of samples, clinical information, pathological finding, follow-up data, etc. is a time-consuming process but is strongly recommended to facilitate comprehensive translational research. (Betsou et al., 2010, Hewitt, 2011, Huang et al., 2011, Riegman et al., 2008) Bioinformatics software aiding with these tasks is either available as commercial off-the-shelf (COTS), with proprietary code and high cost, or as academic or open-source, with complexity that is difficult to adopt within other laboratories. (Greely, 2007, Huang et al., 2011, Kauffmann and Cambon-Thomsen, 2008, Minamikumo, 2012, Prilusky et al., 2005) As such, we developed and implemented a software system to support biobank investigators in accessing all clinical and experimental disease-associated information required for basic and translational research. Herein, we discuss the application of our in-house laboratory information management system (LIMS), Biosearch System, for genomic analysis specifically for breast cancer transcriptomics, with further extensibility for any other diseases.

Materials and methods

Biosearch System was developed using a three-tier architecture model:

Presentation tier: An interactive web browser for the end user’s computer
Data tier: A SQL Server Management Studio that manages the storage and DB-server
Business tier: A bridge between the other two tiers, facilitating the collection of data from the presentation tier and validating it before finally sending it to the data tier, and vice versa

Biosearch System follows standard guidelines like legal requirements, standardized architecture, workflow, and dedicated staff members, as well as technical procedures to record and access clinical data, as described herein.

Legal requirements

The LIMS has been designed in accordance with the Saudi Arabian approved regulation for genomic medicine research. Patients were informed prior and written consents were taken by their doctors for their samples to be sent to the biobank for research. We provided consent forms to patients approved by the ethics committee. Additionally, the software stores data in coded format to hide the identity of patients during any research presentation and publications. In order to ensure bioethical safety, we followed Saudi's National Committee of Biological and Medical Ethics guidelines. (Royal decree No. M/59, dated 14/9/1431H – 24/8/2010)

Software and hardware architecture

We began with a prototype, an early model building process to test a proposed concept to enhance precision by system analysts and users. Database software design started by first designing a low fidelity prototype (paper-based prototype) and then designing a high-fidelity prototype (computer-based prototype). Collected information was categorized into logical groups or entities like "sample," "storage condition," "specimen request," "approval system," "project type," "diagnosis," and "disease type," and an entity-relationship diagram was built to show numerical relationship among different entities.

The LIMS itself was built on servers with following specifications: Windows Server 2012 R2 Enterprise Edition service packs 2, .NET 4.5, IIS 8.5, VS 2012, Microsoft SQL Server 2008 R2, ASP.NET, and AJAX Control Kit 4.5. A web browser was used for the graphical user interface (GUI). Database software was developed to support the high-performance hardware system and is compatible with common web browsers. To develop a robust and reliable LIMS, the following features were included:

a web-based application for wide access to the database;
a specimen and sample labelling mechanism before enrolling them into the database;
security mechanisms that limit role-based access to authorized personnel; and
a disaster management system to cope with any natural disaster.

System structure

To manage the laboratory's services, the software addresses the (i) control of specimen and sample data (i.e., add, edit, confirm transfer, retrieve, and search); (ii) viewing of the log book; (iii) display of a specimen or sample's status; (iv) viewing of box content; (v) addition of new entities, tests, categories, etc. (e.g., diagnostic tests, extractions, hospitals, sample types, projects); and (vi) request of a specimen or sample. Specimens and samples stored in the biobank are provided to researchers per project requirements and with proper justification and ethical approval. The requested sample is first checked by biobank staff then forwarded to the lab manager, before finally going to the director for final approval. Researchers can track the processing steps while the software updates the decision status by email. After delivery of approved specimens, requisite volume is automatically subtracted from biobank stock.

Database structure

A normal relational database is used for the system. Information is stored in specific tables like the SampleInfo table, which contains the sample's information, the ProductInfo table, which contains the type of sample information (e.g., DNA, RNA), the PatientInfo table, which contains a patient's related information, the SampleStorage table, which contains the storage location information (e.g., refrigerators, shelf, box container), and the UsersInfo table, which contains user-related information. The type of relation is one-to-many (one patient::multi-samples, one sample::multi-products, one refrigerator::multi-samples, etc.).

Abbreviations, acronyms, and initialisms

CEGMR: Center of Excellence in Genomic Medicine Research

DEG: diferentialy expresed gene

LIMS: laboratory information management system

MRN: medical record number

References

Notes

This presentation is faithful to the original, with minor changes to presentation; grammar and spelling required more cleanup for improved readability. In some cases important information was missing from the references, and that information was added.