Difference between revisions of "Journal:Development of Biosearch System for biobank management and storage of disease-associated genetic information"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 534: Line 534:
|}
|}


Biosearch System differs from other LIMS in its customized approach for CEGMR laboratory while remaining flexible enough to be adjusted for any changes in the future to suit the needs of projects and researchers, including the addition of equipment, procedures, and/or software to complement or improve workflow without changing the core code. For example, essential customization needed for research purposes is to link SMARTGENE software, a tool for the diagnostic lab, with Biosearch System and exchange data, with researchers' consent. It can be used for any laboratory (e.g., QC, R&D, and analytical service) with minor modifications, whereas most commercial software are a bit too rigid. Its flexibility extends to data quality through computerized calculations and statistics, automation of processes, and  the minimization of manual lab tasks such as data entry. For instance, the system ensures data entry occurs in a standardized way using pre-defined drop-down lists for most processes.
differs from other LIMS in its customized approach for CEGMR laboratory while remaining flexible enough to be adjusted for any changes in the future to suit the needs of projects and researchers, including the addition of equipment, procedures, and/or software to complement or improve workflow without changing the core code. For example, essential customization needed for research purposes is to link SMARTGENE software, a tool for the diagnostic lab, with Biosearch System and exchange data, with researchers' consent. It can be used for any laboratory (e.g., QC, R&D, and analytical service) with minor modifications, whereas most commercial software are a bit too rigid. Its flexibility extends to data quality through computerized calculations and statistics, automation of processes, and  the minimization of manual lab tasks such as data entry. For instance, the system ensures data entry occurs in a standardized way using pre-defined drop-down lists for most processes.


Biosearch System efficiently maintains the quality of data and permits pliability in the workflow, ultimately synchronizing the biobank system of CEGMR/KAU. The current state of the CEGMR biobank is as follows:  
Biosearch System efficiently maintains the quality of data and permits pliability in the workflow, ultimately synchronizing the biobank system of CEGMR/KAU. The current state of the CEGMR biobank is as follows:  
Line 556: Line 556:
|}
|}


===Identification of DEGs for breast cancer===
We conducted a genome-wide expression study to understand the molecular phenomenon leading to triple-negative breast cancer and found 1,181 differentially expressed genes, with ''p''-value <0.05 and FC 2. The upregulated genes for TNBC were IFI6, LEF1, CCR8, FANCI, TRIM59, CASC5, and PLXNA3, while downregulated genes were ADH1B, LYVE1, ADH1C, ADH1B, FIGF, ADIPOQ, PLIN1, and LYVE1. Hierarchical clustering of the top 135 DEGs shows distinct pattern for genes in the LIMS' triple-negative and control samples (Fig. 6).
[[File:Fig6 Karim JofKSUScience2022 34-2.jpg|1200px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="1200px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 6.''' Hierarchical clustering of DEGs. Unsupervised clustering showing expression pattern of genes in triple-negative samples in Biosearch System. Blue and red colors indicating down- and upregulated genes. Row and Column represents samples and DEGs, respectively.</blockquote>
|-
|}
|}
===Pathways associated with triple-negative breast cancer===
Molecular pathway analysis revealed more than 100 canonical pathways, where most activated pathways were (i) interferon signaling pathways (z-score = 2.646) with participating genes IFI6, FNG, IRF1, IRF9, MED14, MX1, PSMB8, STAT1, STAT2; and (ii) kinetochore metaphase signaling pathways (z-score = 2.138) with participating genes BUB1, BUB1B, CDK1, CENPL, KIF2C, KNL1, KNTC1, NUF2, PLK1, PPP1CA, PPP1R14B, PTTG1, RAD21, REC8. and TTK (Fig. 7).
[[File:Fig7 Karim JofKSUScience2022 34-2.jpg|1000px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="1000px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 7.''' Interferon signaling. Overall, the pathway is predicted to be upregulated, and participating differentially expressed genes are IFI6, FNG, IRF1, IRF9, MED14, MX1, PSMB8, STAT1, and STAT2.</blockquote>
|-
|}
|}


==Discussion==
==Discussion==
Today’s biobanks are much more than just sample repositories. They store a huge amount of clinical and experimental data related to specimens. (Calleros et al., 2012) However, efficient data management is a bottleneck in the genomic medicine research process. It has been observed on many occasions that researchers face difficulties in data collection, maintenance, follow-up studies, and sometime even publishing the work without missing data. (Greely, 2007, Kauffmann and Cambon-Thomsen, 2008, Minamikumo, 2012) Translational genomic research is based on a secure database and biobank system. A well-designed data management system provides the possibility of finding significant associations among stored information and facilitates diagnostics and therapeutics research. (Lemmon et al., 2011, Zerhouni, 2005)
Our LIMS, Biosearch System, is used to collect, store, distribute and maintain specimen data, as well as its relevant clinical and experimental information to support ongoing research that can lead to the discovery of novel cancer biomarkers and beneficial therapeutic targets. (Karim et al., 2016, Karim et al., 2019, Merdad et al., 2014, Merdad et al., 2015, Mirza et al., 2015, Mirza et al., 2014, Rasool et al., 2021, Rasool et al., 2020, Schulten et al., 2016, Subhi et al., 2020, Sultan et al., 2021) We used SQL Server Management Studio, Visual Basic, .NET, and the Ajax control kit to develop its interactive database, and ASP.NET for better code management, cleaner code structure, and faster web applications. The first prototype of Biosearch System was ready in 2010, and after feasibility testing and a successful trial run, we released the present final version for use within CEGMR/KAU, with minor modifications. The LIMS is a web-based tool accessible from both on and off campus. A high level of security allows only authorized researchers to access the database. Security threats such as hacking, virus attack, etc. were addressed without any significant compromise. (Bjugn and Hansen, 2013, Mintzer et al., 2013, Rogers et al., 2011, Simeon-Dubach et al., 2013) Bioethical safety was another major concern, which demanded proper care in establishing the biobank and developing the database. ((ISBER), 2012, Bredenoord et al., 2011, Hansson, 2009, McGuire et al., 2008, Rotimi and Marshall, 2010).
Biosearch System makes the job easier for everybody involved in research. First, clinicians and pathologists can include the clinico-pathological information of each provided specimen from hospitals/clinics through an online portal. Second, biobank staff can store and maintain the specimen and extracted derivatives easily, and finally, researchers can extract clinical information and request specific specimens/derivatives per the requirements of their ongoing project. (Kang et al., 2013) Presently our dataset is small, and we are collecting selected cases only, so the frequency of disease should not be used to represent society at large. However, the frequency of different type of cancer in our biobank is similar to our national cancer registry. In the future, the LIMS can pave the way for expanding this database in association with the Saudi cancer registry or any other national-level databases.
The LIMS was tested by CEGMR staff and was found satisfactory. The users were comfortable organizing all specimen/sample-related information and found it user-friendly with simple icons, buttons, drop-down lists, etc. To ensure the safety of data, KAU network security was utilized, and only users with approved permissions and sufficient supporting anti-virus tools were allowed to access the system. A periodic system backup strategy also ensured avoidance of any data loss due to unseen disaster. The LIMS is customized according to the actual activities and workflow in the CEGMR lab, while remaining flexible enough to adopt new modules to add more features in the future.
Presently, Biosearch System is not available as open-source software because of institutional policy, but we encourage interested researchers to request the code on an individual basis.
Presently, Biosearch System is not available as open-source software because of institutional policy, but we encourage interested researchers to request the code on an individual basis.
==Conclusion and future work==
Biosearch System is a user-friendly LIMS solution for managing biobank specimens and their associated clinical information to facilitate genomic medicine research, leading to the discovery of disease biomarkers and therapeutic targets. It is a flexible system, one that is able to be modified to meet specific workflows. Going forward, new features and modules to be added to the CEGMR/KAU implementation include a barcoding system, quality control system, and reagent purchasing system.
==Acknowledgements==
We would like to thank the Biobank and IT unit staff of Center of Excellence in Genomic Medicine Research, AZIZ Supercomputing facilities at High-Performance Computing Center, and Deanship of Scientific Research, King Abdulaziz University for their help and technical support.
===Author contributions===
Sajjad Karim: Conceptualization, Data curation, Fund acquisition, Investigation, Project Administration, Software, Supervision, Writing - original draft. Adnan Merdad: Conceptualization, Fund acquisition, Validation, Writing - review & editing. Saddig Jastaniah: Conceptualization, Fund acquisition, Validation, Writing - review & editing. Sudhir Kumar: Conceptualization, Fund acquisition, Supervision. Adel Abuzenadah: Conceptualization, Project Administration, Resources, Validation, Writing - review & editing. Mohammed Al-Qahtani: Conceptualization, Project Administration, Resources, Validation, Writing - review & editing. Mona Alkharaz: Data curation, Formal analysis, Investigation, Methodology. Zeenat Mirza: Data curation, Investigation, Methodology, Software, Visualization, Writing - original draft. Mahmood Rasool: Data curation, Investigation, Validation, Writing - review & editing. Hend Noureldin: Formal analysis. Heba Abusamra: Formal analysis. Nofe Alganmi: Methodology, Resources, Software, Supervision, Visualization, Writing - review & editing.
===Funding===
This study was funded by Deanship of Scientific Research, King Abdulaziz University (2-117-1434-HiCi).
===Ethics approval and consent to participate===
Ethical committee approved this study (Reference Number: 08-CEGMR-02-ETH) of CEGMR, KAU.
===Availability of data and materials===
Datasets (.CEL file) submitted to NCBI’s GEO, [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36295 accession number GSE36295].
===Competing interests===
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


==Abbreviations, acronyms, and initialisms==
==Abbreviations, acronyms, and initialisms==

Revision as of 22:34, 4 July 2022

Full article title Development of Biosearch System for biobank management and storage of disease-associated genetic information
Journal Journal of King Saud University - Science
Author(s) Karim, Sajjad; Al-Kharraz, Mona; Mirza, Zeenat; Noureldin, Hend; Abusamara, Heba; Alganmi, Nofe; Merdad, Adnan; Jastaniah, Saddig; Kumar, Sudhir; Rasool, Mahmood; Abuzenadah, Adel; Al-Qahtani, Mohammed
Author affiliation(s) King Abdulaziz University, Temple University
Primary contact Email: skarim1 at kau dot edu dot sa
Year published 2022
Volume and issue 34(2)
Article # 101760
DOI 10.1016/j.jksus.2021.101760
ISSN 1018-3647
Distribution license Creative Commons Attribution 4.0 International
Website https://www.sciencedirect.com/science/article/pii/S1018364721004225
Download https://www.sciencedirect.com/science/article/pii/S1018364721004225/pdfft (PDF)

Abstract

Objective: Databases and software are important to manage modern high-throughput laboratories and store clinical and genomic information for quality assurance. Commercial software is expensive, with proprietary code issues, while academic versions have adaptation issues. Our aim was to develop an adaptable in-house software system that can store specimen- and disease-associated genetic information in biobanks to facilitate translational research.

Methods: A prototype was designed per the research requirements, and computational tools were used to develop the software under three tiers, using Visual Basic and ASP.net for the presentation tier, SQL Server for the data tier, and Ajax and JavaScript for the business tier. We retrieved specimens from the biobank using this software and performed microarray-based transcriptomic analysis to detect differentially expressed genes (DEGs) with p-value <0.05 and fold change (FC) ±2 in triple-negative breast cancer cases. The Ingenuity Pathway Analysis (IPA) tool was used to predict canonical molecular pathways associated with disease. Overall performance and utility of software was evaluated by Apache JMeter software, CRUD function testing, and a set of user feedback questionnaires.

Results: We developed Biosearch System, a web-based laboratory information management system (LIMS) enabling management of biobank samples (e.g., tissue, blood, FTTP slides) and their extracts (e.g., DNA, RNA, and proteins) with clinical and experimental details. The client satisfaction feedback was excellent, with a score of 4.7 out of 5. We identified a total of 1,181 DEGs, including both upregulated (IFI6, LEF1, FANCI, CASC5, PLXNA3, etc.) and down-regulated (ADH1B, LYVE1, ADH1C, ADH1B, ADIPOQ, PLIN1, LYVE1, etc.) genes in triple-negative breast cancer. Pathway analysis of DEGs revealed significant activation of interferon signaling (z-score 2.646) and kinetochore metaphase signaling pathway (z-score 2.138) in cancer.

Conclusion: Biosearch System is a user friendly LIMS for collection, storage, and retrieval of specimen and clinical information. It is secure, efficient, and convenient in sample tracking and data analysis. We illustrated its utility in transcriptomic study of breast cancer. Additionally, it can facilitate and speed up any genomic study and translational research publications.

Keywords: Biosearch System, LIMS, database, biobank, genomics, microarray, bioinformatics

Graphical abstract:

GA Karim JofKSUScience2022 34-2.jpg

Introduction

Biobanks are established for long-term storage and conservation facility for biological specimens along with their demographic, clinical and experimental information to support scientific investigation using bioinformatics tools (Artene et al., 2013). That investigation, tied to high-throughput technological advancements in next-generation sequencing (NGS) and microarrays, has generated significant amounts of data at relatively low costs. (Diamandis, 2009, Fehniger and Marko-Varga, 2011, Glenn, 2011, Merdad et al., 2014) However, as the amount of biological specimens grows into the thousands, manual methods fail in the efficient handling of specimens and samples, and the risk of information being lost increases. Quality becomes another major concern with increasing volume of big data. As such, a robust software solution becomes necessary to effectively manage biobank and laboratory information and ensure the quality of results with approved ethical guidelines and personal integrity. ((ISBER), 2012, Bredenoord et al., 2011, Kang et al., 2013, Voegele et al., 2007) Software is helpful in managing the workflow cycle—comprising of collection, storage, analysis, and report generation—facilitating quick and easy retrieval of information and speeding up biomarker and therapeutic discoveries. (Melo et al., 2010)

The collection of a significant cohort size with crucial factors like number and type of samples, clinical information, pathological finding, follow-up data, etc. is a time-consuming process but is strongly recommended to facilitate comprehensive translational research. (Betsou et al., 2010, Hewitt, 2011, Huang et al., 2011, Riegman et al., 2008) Bioinformatics software aiding with these tasks is either available as commercial off-the-shelf (COTS), with proprietary code and high cost, or as academic or open-source, with complexity that is difficult to adopt within other laboratories. (Greely, 2007, Huang et al., 2011, Kauffmann and Cambon-Thomsen, 2008, Minamikumo, 2012, Prilusky et al., 2005) As such, we developed and implemented a software system to support biobank investigators in accessing all clinical and experimental disease-associated information required for basic and translational research. Herein, we discuss the application of our in-house laboratory information management system (LIMS), Biosearch System, for genomic analysis specifically for breast cancer transcriptomics, with further extensibility for any other diseases.

Materials and methods

Biosearch System was developed using a three-tier architecture model:

  1. Presentation tier: An interactive web browser for the end user’s computer
  2. Data tier: A SQL Server Management Studio that manages the storage and DB-server
  3. Business tier: A bridge between the other two tiers, facilitating the collection of data from the presentation tier and validating it before finally sending it to the data tier, and vice versa

Biosearch System follows standard guidelines like legal requirements, standardized architecture, workflow, and dedicated staff members, as well as technical procedures to record and access clinical data, as described herein.

Legal requirements

The LIMS has been designed in accordance with the Saudi Arabian approved regulation for genomic medicine research. Patients were informed prior and written consents were taken by their doctors for their samples to be sent to the biobank for research. We provided consent forms to patients approved by the ethics committee. Additionally, the software stores data in coded format to hide the identity of patients during any research presentation and publications. In order to ensure bioethical safety, we followed Saudi's National Committee of Biological and Medical Ethics guidelines. (Royal decree No. M/59, dated 14/9/1431H – 24/8/2010)

Software and hardware architecture

We began with a prototype, an early model building process to test a proposed concept to enhance precision by system analysts and users. Database software design started by first designing a low fidelity prototype (paper-based prototype) and then designing a high-fidelity prototype (computer-based prototype). Collected information was categorized into logical groups or entities like "sample," "storage condition," "specimen request," "approval system," "project type," "diagnosis," and "disease type," and an entity-relationship diagram was built to show numerical relationship among different entities.

The LIMS itself was built on servers with following specifications: Windows Server 2012 R2 Enterprise Edition service packs 2, .NET 4.5, IIS 8.5, VS 2012, Microsoft SQL Server 2008 R2, ASP.NET, and AJAX Control Kit 4.5. A web browser was used for the graphical user interface (GUI). Database software was developed to support the high-performance hardware system and is compatible with common web browsers. To develop a robust and reliable LIMS, the following features were included:

  • a web-based application for wide access to the database;
  • a specimen and sample labelling mechanism before enrolling them into the database;
  • security mechanisms that limit role-based access to authorized personnel; and
  • a disaster management system to cope with any natural disaster.

System structure

To manage the laboratory's services, the software addresses the (i) control of specimen and sample data (i.e., add, edit, confirm transfer, retrieve, and search); (ii) viewing of the log book; (iii) display of a specimen or sample's status; (iv) viewing of box content; (v) addition of new entities, tests, categories, etc. (e.g., diagnostic tests, extractions, hospitals, sample types, projects); and (vi) request of a specimen or sample. Specimens and samples stored in the biobank are provided to researchers per project requirements and with proper justification and ethical approval. The requested sample is first checked by biobank staff then forwarded to the lab manager, before finally going to the director for final approval. Researchers can track the processing steps while the software updates the decision status by email. After delivery of approved specimens, requisite volume is automatically subtracted from biobank stock.

Database structure

A normal relational database is used for the system. Information is stored in specific tables like the SampleInfo table, which contains the sample's information, the ProductInfo table, which contains the type of sample information (e.g., DNA, RNA), the PatientInfo table, which contains a patient's related information, the SampleStorage table, which contains the storage location information (e.g., refrigerators, shelf, box container), and the UsersInfo table, which contains user-related information. The type of relation is one-to-many (one patient::multi-samples, one sample::multi-products, one refrigerator::multi-samples, etc.).

Workflow and biobank organizational structure

The software can only be used with an authorized username and password, and after successful login on a web-based client PC, users are allowed to add new data, update/edit data, follow up with patients, and retrieve data for analysis and interpretation. Biosearch System also manages the storage, quality, quantity, distribution, and maintenance of specimens (e.g., tissue, blood, serum, etc.) and its derivatives (e.g., DNA, RNA, protein, plasma, etc.). Per the existing CEGMR biobank organizational structure, assigned users have different level of rights for using the software. Researchers are end users of the biobank; they request samples based on their active research projects. Dedicated biobank staff examine the request and update the status, handing it off to the supervisor, who approves or rejects the request with valid reason. However, the final decision is made by the director, who has ultimate authority to reconsider the request or reverse the supervisors' decision. Once approved by the director, the biobank delivers the requested items within a week. Reminders for pending tasks are emailed to relevant staff until the final decision is made. This system enables researchers full access to the biobank information by allowing for the management of specimens and samples, the notification for request approval or rejection, and provision of the overall summary of the biobank’s inventory to researchers.

Technical procedure for linking clinical information to biobank specimens

The LIMS manages all the relevant information for all deposited patients’ specimens and samples for systematic clinical research. We use a medical record number (MRN) as the primary key and a biobank number as the secondary key to connect two sections of the database. Patients’ samples are stored at the biobank with a unique allocated biobank number, which consists of 10 digits formatted as sample type (XX), serial number (XXXX), year (XX), and extraction type (XX). Sample types in our database includes AM (amniotic fluid), BL (blood), BO (bone marrow), CO (cord blood), CS (cervix swab), LN (lymph node), PC (product of conception), PL (paraffin embedded tissue with lymph node), PN (normal paraffin embedded tissue), PT (tumor paraffin embedded tissue), TM (tumor tissue), and TN (normal tissue). Depending on research needs, extraction is done from raw specimens, and the extraction type includes D (DNA), R (RNA), P (protein), etc. For a peripheral blood specimen received with serial number 1252 in year 2014, with a DNA sample then extracted from it, our assigned biobank number will be “BL-1252-14D.” This nomenclature system provides a clue about samples; however, patient confidentiality is protected as per current security regulations. To guarantee quality of samples, the biobank has ascertained standard policies via a quality management system. Specimens and samples are stored in liquid nitrogen, −80 °C, −20 °C, or 4 °C refrigerators depending on its type and requirement. Vials containing specimen or extracted sample are stored in an assigned area so that all aliquots can be retrieved in the LIMS from the defined physical location.

Evaluation of software features and performance efficiency

We evaluated the efficiency of Biosearch System by performance testing using (i) Apache JMeter software, (ii) Create, Read, Update, and Delete (CRUD) function testing, and (ii) and a set of user feedback questionnaires. To verify the efficiency (i.e., speed, scalability, and stability) of the LIMS, a performance test was done through Apache JMeter by running different numbers of users (i.e., 500, 100, 50, 10, 1), with 50 loop and 10 ramp-up periods. CRUD function testing using different SQL statement was also used for performance testing.

We also evaluated the LIMS through feedback provided by questionnaire concerning the LIMS' features and efficiency. Answers allowed for a response using a scale from 1 to 5 to specific questions such as:

  • Is it easy to reach the website application?
  • Does website application load quickly?
  • Are the fonts easy to read on various screen resolutions?
  • Is the color scheme appropriate and comfortable to the eye?
  • Is the content logically separated and appearing in an appropriate way?
  • Is the website application easy to use?
  • Is navigation easy?
  • Are all buttons (internal and external) valid and active?
  • Is the copy and paste feature allowed?
  • Is the autocomplete feature allowed?
  • Do clickable icons work smoothly on a single click?
  • Is the website application free from server-side errors?
  • Are you able to search, retrieve, and edit data (e.g., samples description, patient information, project details, etc.) easily?
  • Do you accurately receive an alert message for missing data?
  • Is data printed in an appropriate tabular format?
  • Is the help desk easy to contact for any issue in regards to the web-based application?

Transcriptomic analysis of breast cancer using specimens from Biosearch System

We retrieved breast cancer samples and healthy controls using transcriptomic profiling via the Affymetrix platform. Partek GS v6.7 (Partek, USA) was used for data analysis. Imported data was normalized with a robust multi-array averaging process, and ANOVA was applied to the generated DEG gene list using a p value <0.05 and fold change (FC) >2. Principle component analysis (PCA) was done for high-dimensional visualization. Unsupervised hierarchical clustering was done for significant DEG as a similarity matrix.

Molecular pathway analysis

Pathways for DEGs were analyzed using the Ingenuity Pathway Analysis (IPA) tool (Ingenuity, USA) in triple-negative breast cancer cases. IPA predicts molecular networks, canonical pathways associated with uploaded DEG with p-value and FC cutoffs.

Results

Over the past 13 years (2007–2020), we have collected a significantly high number of specimens (n = 25,396) and their derivatives (n = 27,882), with those numbers growing every year (Fig. 1, Table 1). Given the challenges of managing all that associated data, we established a disease-oriented biobank and developed Biosearch System, an in-house LIMS to better manage biobanks and store disease-associated genomic information.


Fig1 Karim JofKSUScience2022 34-2.jpg

Figure 1. Collection and storage of specimen and disease-associated information at CEGMR biobank. Pictorial graph showing progressive growth of specimen/derivatives from zero in 2007 to more than 27,000 by 2020.

Table 1. Frequencies of extracts derived from clinical specimen stored in CEGMR biobank
Derivative extraction number Abbreviation Number Percentage
DNA D 22,212 80.54611061
RNA R 4,276 14.83588925
MicroRNA N 47 0.09367844
Protein P 21 0.072861009
Plasma M 348 1.207411005
Serum S 288 0.999236694
Cell C 243 0.843105961
miRNA of Plasma PN 44 0.024287003
DNA of Plasma PD 118 0.041634862
RNA of Plasma PR 22 0.041634862
miRNA of Serum SN 35 0.020817431
DNA of Serum SD 125 0.03816529
RNA of Serum SR 102 0.041634862
Total 27,882 100%

A prototype of an entity-relationship diagram was developed to depict the numerical relationship among different entities such as sample, storage condition, specimen request, approval system, project type, diagnosis, and disease type (Fig. 2). We established communication between the Biosearch System host server using SQL Server Management Studio and employed ASP.NET and JavaScript/Ajax toolkit for end users (Fig. 3). It was divided into three main parts: (i) data and specimen acquisition, (ii) data management, and (iii) specimen distribution/request (Fig. 4). It classifies the user into one of four groups (i.e., biobank technician, researcher, lab manager, and director), and hierarchal permission is given accordingly. The administrator of Biosearch System has the power to perform all functions, including addition of new users and granting permission according to hierarchy (Table 2).


Fig2 Karim JofKSUScience2022 34-2.jpg

Figure 2. The entity-relationship diagram for database structure used for Biosearch System, depicting the numerical relationship among different entities such as sample, storage condition, specimen request, approval system, project type, diagnosis, and disease type.

Fig3 Karim JofKSUScience2022 34-2.jpg

Figure 3. Black box of system structure. The relationship and the communication process between the server side and client side; where Ajax and JavaScript sends the client request to the SQL server side, and ASP.net shows the processed response back at the client end using the GUI.

Fig4 Karim JofKSUScience2022 34-2.jpg

Figure 4. Biological sample life cycle. Sequential flow of each biobank specimen starting from collection and moving on to processing of sample, storage, maintenance, request, retrieval, distribution, and utilization.

Table 2. Type of users and granted job permission in the system
User type Job/permissions of user
Director ▪ Final decision to accept or reject requested samples
▪ Can search and retrieve biobank data
Lab manager ▪ Checking and commenting on the requested samples
▪ Either add new abbreviation for extraction type, diagnostic type, hospital name, sample type, etc. or request the same to administrator
▪ Edit user permission
Researcher ▪ Add new project with its ID, period, type of diagnostic, and title; also specify the required sample for project to provide accordingly
▪ Request sample
▪ Search and retrieve data based on sample ID, sample type, extraction type, nationality, and hospital name
Biobank team ▪ Allowed to add and edit sample information such as sample ID, sample type, diagnostic type, add new vial, edit extraction
▪ Track specimens and samples
▪ Can determine the storage location (e.g., refrigerator number, refrigerator shelf, box container, coordinates) and actual status (e.g., extraction date, quantity, concentration, purity, availability) of specimens and samples requested by researchers
▪ Can extract the Excel or .csv file of all data using "Show logbook"

Biosearch System is supported by a data warehouse and query tools interface that can be used efficiently to search and short-list patients as per a particular criterion without disclosing their privacy. Based on evaluation form analysis, we found that it is a robust, secure, flexible, efficient, and user-friendly database. It proves robust as it is platform-independent, compatible with any version of Windows. Scripts are bug-free, so as to avoid hanging up the system. It proves secure since in addition to directing an appropriate permission strategy for different users and password protection, a King Abdulaziz University (KAU) network security strategy was also incorporated to transfer data safely, where each user has their own ID and password to access a specific computer, and the password is stored in a database in an encrypted way. Each machine that has access to the LIMS must have its own firewall and Trend Micro antivirus office scan agent provided by KAU IT deanship. The system performs backups at a regular interval (i.e., monthly) on KAU servers, so any loss of data from the system can be easily restored thereby saving the date from any disasters. The system proves flexible, as new features/modules can easily be added by authorized users to customize the LIMS according to laboratory and researcher requirements. It's also efficient, as it has been managing huge amounts of CEGMR's clinical and experimental data competently since 2014. Finally, the LIMS proves user-friendly, as simple icons, buttons, drop-down lists, etc. make it convenient for users to manage data and easily add or retrieve data using simple icons with multiple other available options.

Our measure of the performance and utility of the LIMS evaluates it well with the help of real-time feedback from users, who gave it an overall satisfaction score of 4.7 out of 5. The results of performance analysis of search, insert, update, and delete functions were also satisfactory (Table 3). Biosearch System is efficient in terms of the number of transactions and number of users it manages while maintaining the quality of data and permitting pliability in laboratory workflow.

Table 3. Results of a performance test of search, insert, update, and delete functions.
 
Samples = The number of samples with the same label; Avg. time (ms) = The average elapsed time of a set of results, in milliseconds; Min = The lowest elapsed time for the samples with the same label; Max = The longest elapsed time for the samples with the same label; Std. dev. = The standard deviation of the sample elapsed time; Error % = Percent of requests with errors; Throughput = Throughput measured in requests per second/minute/hour; Received (KB/sec) = Downstream throughput measured in kilobytes per second; Sent (KB/sec) = Upstream throughput measured in kilobytes per second; Avg. (bytes) = Average size of the sample response, in bytes.
# of users Avg. time (ms) Samples Min Max Std. dev. Error % Throughput Received (KB/sec) Sent (KB/sec) Avg. (bytes)
Search function results
500 users 25,000 1,489 99 10,950 972.25 0.0 1040.15 248 152 1,185.01
100 users 5,000 285 18 2,190 189.85 0.0 210.03 48.29 29.99 235.01
50 users 2,500 142 26 334 61.63 0.0 154.90 37.36 22.06 236.04
10 users 500 30 23 58 3.51 0.0 49.29 11.09 7.04 235.99
1 user 50 31 25 53 4.38 0.0 35.64 8.01 5.05 237.99
Insert function results
500 users 25,000 1,492 3 11,489 1,075 0.0 1,019 230.12 263.40 1,234.01
100 users 5,000 287 3 2,299 213.92 0.0 205.04 48.04 57.93 236.99
50 users 2,500 130 13 265 59.03 0.0 155.01 34.97 43.86 239.48
10 users 500 11 11 23 1.20 0.0 52.02 11.99 14.56 239.99
1 user 50 11 11 19 1.32 0.0 79.93 18.40 22.81 237.92
Update function results
500 users 25,000 1,747 5 14,855 2,025 0.0 856.20 278.03 232.8 1,678
100 users 5,000 351 4 2,972 405.01 0.0 171.78 54.97 45.99 337.06
50 users 2,500 422 21 901 161.01 0.0 85.56 27.37 22.05 333.82
10 users 500 21 20 37 2.32 0.0 50.40 16.06 14.06 334.09
1 user 50 21 21 60 5.98 0.0 45.44 14.60 12.24 334.12
Delete function results
500 users 25,000 1,425 4 11,870 1,095.02 0.0 1,034.07 249.02 233.50 1,200.01
100 users 5,000 285 4 2,380 221.01 0.0 207.01 49.07 47.05 241.08
50 users 2,500 118 12 300 59.07 0.0 159.05 36.12 36.02 241.02
10 users 500 13 12 30 1.35 0.0 52.05 13.08 11.62 239.92
1 user 50 13 12 21 2.042 0.0 77.92 19.03 17.63 241.01

differs from other LIMS in its customized approach for CEGMR laboratory while remaining flexible enough to be adjusted for any changes in the future to suit the needs of projects and researchers, including the addition of equipment, procedures, and/or software to complement or improve workflow without changing the core code. For example, essential customization needed for research purposes is to link SMARTGENE software, a tool for the diagnostic lab, with Biosearch System and exchange data, with researchers' consent. It can be used for any laboratory (e.g., QC, R&D, and analytical service) with minor modifications, whereas most commercial software are a bit too rigid. Its flexibility extends to data quality through computerized calculations and statistics, automation of processes, and the minimization of manual lab tasks such as data entry. For instance, the system ensures data entry occurs in a standardized way using pre-defined drop-down lists for most processes.

Biosearch System efficiently maintains the quality of data and permits pliability in the workflow, ultimately synchronizing the biobank system of CEGMR/KAU. The current state of the CEGMR biobank is as follows:

  1. Infrastructure: The biobank is well equipped with liquid nitrogen cylinders (−196 °C), deep freezers (−80 °C and −20 °C), refrigerators, and UPS systems.
  2. Personnel: Presently, 14 dedicated staff work to ensure the smooth running of the biobank unit.
  3. Workflow: Patient specimen/sample management is fully functional and regularly handles the dispatch, collection, processing, and storage of thousands of specimens.

The LIMS can be accessed by authorized personnel using the username only (Fig. 5). Our clinical database contains the following clinico-pathological parameters: CEGMR code for patient, receiving date, hospital MRN number, name, date of birth, age, sex, nationality, disease, date of diagnosis, status, filing date, histology, sites, grade, size, lymph node status, invasion, margin status, immunohistochemical data, family history, medication, follow-up history, etc. Similarly, the LIMS' specimen/sample management functionality manages specimens and their derivative related information, including type of specimen, receiving date, extraction date, storage of specimen and its derivatives, quality and quantity record, handling request of researchers, distribution, and maintenance of specimen.


Fig5 Karim JofKSUScience2022 34-2.jpg

Figure 5. A glimpse of Biosearch System. Screenshot of the new sample addition page, allowing authorized researchers to access samples and their disease associated genomic and clinical information.

Identification of DEGs for breast cancer

We conducted a genome-wide expression study to understand the molecular phenomenon leading to triple-negative breast cancer and found 1,181 differentially expressed genes, with p-value <0.05 and FC 2. The upregulated genes for TNBC were IFI6, LEF1, CCR8, FANCI, TRIM59, CASC5, and PLXNA3, while downregulated genes were ADH1B, LYVE1, ADH1C, ADH1B, FIGF, ADIPOQ, PLIN1, and LYVE1. Hierarchical clustering of the top 135 DEGs shows distinct pattern for genes in the LIMS' triple-negative and control samples (Fig. 6).


Fig6 Karim JofKSUScience2022 34-2.jpg

Figure 6. Hierarchical clustering of DEGs. Unsupervised clustering showing expression pattern of genes in triple-negative samples in Biosearch System. Blue and red colors indicating down- and upregulated genes. Row and Column represents samples and DEGs, respectively.

Pathways associated with triple-negative breast cancer

Molecular pathway analysis revealed more than 100 canonical pathways, where most activated pathways were (i) interferon signaling pathways (z-score = 2.646) with participating genes IFI6, FNG, IRF1, IRF9, MED14, MX1, PSMB8, STAT1, STAT2; and (ii) kinetochore metaphase signaling pathways (z-score = 2.138) with participating genes BUB1, BUB1B, CDK1, CENPL, KIF2C, KNL1, KNTC1, NUF2, PLK1, PPP1CA, PPP1R14B, PTTG1, RAD21, REC8. and TTK (Fig. 7).


Fig7 Karim JofKSUScience2022 34-2.jpg

Figure 7. Interferon signaling. Overall, the pathway is predicted to be upregulated, and participating differentially expressed genes are IFI6, FNG, IRF1, IRF9, MED14, MX1, PSMB8, STAT1, and STAT2.

Discussion

Today’s biobanks are much more than just sample repositories. They store a huge amount of clinical and experimental data related to specimens. (Calleros et al., 2012) However, efficient data management is a bottleneck in the genomic medicine research process. It has been observed on many occasions that researchers face difficulties in data collection, maintenance, follow-up studies, and sometime even publishing the work without missing data. (Greely, 2007, Kauffmann and Cambon-Thomsen, 2008, Minamikumo, 2012) Translational genomic research is based on a secure database and biobank system. A well-designed data management system provides the possibility of finding significant associations among stored information and facilitates diagnostics and therapeutics research. (Lemmon et al., 2011, Zerhouni, 2005)

Our LIMS, Biosearch System, is used to collect, store, distribute and maintain specimen data, as well as its relevant clinical and experimental information to support ongoing research that can lead to the discovery of novel cancer biomarkers and beneficial therapeutic targets. (Karim et al., 2016, Karim et al., 2019, Merdad et al., 2014, Merdad et al., 2015, Mirza et al., 2015, Mirza et al., 2014, Rasool et al., 2021, Rasool et al., 2020, Schulten et al., 2016, Subhi et al., 2020, Sultan et al., 2021) We used SQL Server Management Studio, Visual Basic, .NET, and the Ajax control kit to develop its interactive database, and ASP.NET for better code management, cleaner code structure, and faster web applications. The first prototype of Biosearch System was ready in 2010, and after feasibility testing and a successful trial run, we released the present final version for use within CEGMR/KAU, with minor modifications. The LIMS is a web-based tool accessible from both on and off campus. A high level of security allows only authorized researchers to access the database. Security threats such as hacking, virus attack, etc. were addressed without any significant compromise. (Bjugn and Hansen, 2013, Mintzer et al., 2013, Rogers et al., 2011, Simeon-Dubach et al., 2013) Bioethical safety was another major concern, which demanded proper care in establishing the biobank and developing the database. ((ISBER), 2012, Bredenoord et al., 2011, Hansson, 2009, McGuire et al., 2008, Rotimi and Marshall, 2010).

Biosearch System makes the job easier for everybody involved in research. First, clinicians and pathologists can include the clinico-pathological information of each provided specimen from hospitals/clinics through an online portal. Second, biobank staff can store and maintain the specimen and extracted derivatives easily, and finally, researchers can extract clinical information and request specific specimens/derivatives per the requirements of their ongoing project. (Kang et al., 2013) Presently our dataset is small, and we are collecting selected cases only, so the frequency of disease should not be used to represent society at large. However, the frequency of different type of cancer in our biobank is similar to our national cancer registry. In the future, the LIMS can pave the way for expanding this database in association with the Saudi cancer registry or any other national-level databases.

The LIMS was tested by CEGMR staff and was found satisfactory. The users were comfortable organizing all specimen/sample-related information and found it user-friendly with simple icons, buttons, drop-down lists, etc. To ensure the safety of data, KAU network security was utilized, and only users with approved permissions and sufficient supporting anti-virus tools were allowed to access the system. A periodic system backup strategy also ensured avoidance of any data loss due to unseen disaster. The LIMS is customized according to the actual activities and workflow in the CEGMR lab, while remaining flexible enough to adopt new modules to add more features in the future.

Presently, Biosearch System is not available as open-source software because of institutional policy, but we encourage interested researchers to request the code on an individual basis.

Conclusion and future work

Biosearch System is a user-friendly LIMS solution for managing biobank specimens and their associated clinical information to facilitate genomic medicine research, leading to the discovery of disease biomarkers and therapeutic targets. It is a flexible system, one that is able to be modified to meet specific workflows. Going forward, new features and modules to be added to the CEGMR/KAU implementation include a barcoding system, quality control system, and reagent purchasing system.

Acknowledgements

We would like to thank the Biobank and IT unit staff of Center of Excellence in Genomic Medicine Research, AZIZ Supercomputing facilities at High-Performance Computing Center, and Deanship of Scientific Research, King Abdulaziz University for their help and technical support.

Author contributions

Sajjad Karim: Conceptualization, Data curation, Fund acquisition, Investigation, Project Administration, Software, Supervision, Writing - original draft. Adnan Merdad: Conceptualization, Fund acquisition, Validation, Writing - review & editing. Saddig Jastaniah: Conceptualization, Fund acquisition, Validation, Writing - review & editing. Sudhir Kumar: Conceptualization, Fund acquisition, Supervision. Adel Abuzenadah: Conceptualization, Project Administration, Resources, Validation, Writing - review & editing. Mohammed Al-Qahtani: Conceptualization, Project Administration, Resources, Validation, Writing - review & editing. Mona Alkharaz: Data curation, Formal analysis, Investigation, Methodology. Zeenat Mirza: Data curation, Investigation, Methodology, Software, Visualization, Writing - original draft. Mahmood Rasool: Data curation, Investigation, Validation, Writing - review & editing. Hend Noureldin: Formal analysis. Heba Abusamra: Formal analysis. Nofe Alganmi: Methodology, Resources, Software, Supervision, Visualization, Writing - review & editing.

Funding

This study was funded by Deanship of Scientific Research, King Abdulaziz University (2-117-1434-HiCi).

Ethics approval and consent to participate

Ethical committee approved this study (Reference Number: 08-CEGMR-02-ETH) of CEGMR, KAU.

Availability of data and materials

Datasets (.CEL file) submitted to NCBI’s GEO, accession number GSE36295.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations, acronyms, and initialisms

CEGMR: Center of Excellence in Genomic Medicine Research

DEG: differentially expressed gene

FC: fold change

GUI: graphical user interface

IPA: Ingenuity Pathway Analysis

KAU: King Abdulaziz University

LIMS: laboratory information management system

MRN: medical record number

NGS: next-generation sequencing

PCA: principle component analysis

References

Notes

This presentation is faithful to the original, with minor changes to presentation; grammar and spelling required more cleanup for improved readability. In some cases important information was missing from the references, and that information was added.