Journal:Application of informatics in cancer research and clinical practice: Opportunities and challenges

From LIMSWiki
Revision as of 22:17, 5 December 2022 by Shawndouglas (talk | contribs) (Message box)
Jump to navigationJump to search
Full article title Application of informatics in cancer research and clinical practice: Opportunities and challenges
Journal Cancer Innovation
Author(s) Hong, Na; Sun, Gang; Zuo, Xiuran; Chen, Meng; Liu, Li; Jiani, Wang; Feng, Xiaobin; Shi, Wenzhao; Gong, Mengchun; Ma, Pengcheng
Author affiliation(s) Digital Health China Technologies Co., Xinjiang Cancer Center, Huazhong University of Science and Technology, Southern Medical University, Chinese Academy of Medical Sciences and Peking Union Medical College, Tsinghua University
Primary contact Email: gmc at nrdrs dot org
Year published 2022
Volume and issue 1(1)
Page(s) 80–91
DOI 10.1002/cai2.9
ISSN 2770-9183
Distribution license Creative Commons Attribution 4.0 International
Website https://onlinelibrary.wiley.com/doi/10.1002/cai2.9
Download https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/cai2.9 (PDF)

Abstract

Cancer informatics has significantly progressed in the big data era. We summarize the application of informatics approaches to the cancer domain from both the informatics perspective (e.g., data management and data science) and the clinical perspective (e.g., cancer screening, risk assessment, diagnosis, treatment, and prognosis). We discuss various informatics methods and tools that are widely applied in cancer research and practices, such as cancer databases, data standards, terminologies, high-throughput omics data mining, machine learning algorithms, artificial intelligence imaging, and intelligent radiation. We also address the informatics challenges within the cancer field that pursue better treatment decisions and patient outcomes, and focus on how informatics can provide opportunities for cancer research and practices. Finally, we conclude that the interdisciplinary nature of cancer informatics and collaborations are major drivers for future research and applications in clinical practices. It is hoped that this review is instrumental for cancer researchers and clinicians with its informatics-specific insights.

Keywords: artificial intelligence application, cancer informatics, machine learning

Introduction

Advances in information science and technology have brought significant benefits to cancer research and care, including larger study cohorts, more complete follow-up, more effective clinician teams, lower costs, increased patient life expectancy, and improved quality of life. Despite all cancer-related aspects, such as diagnosis, prognosis, and treatment being significantly improved, this disease area remains one of the most significant challenges in medical science due to disease heterogeneity and the need to identify underlying biomarkers that are potentially linked to specific cancer types.

Cancer informatics is a branch of medical informatics that applies information science, computer science, data science, and information technologies to the field of oncology. This is an area that deals with the resources, devices, and methods required to optimize the acquisition, storage, retrieval, and use of information in cancer. Applied cancer informatics transforms clinical data into meaningful and useful information to improve processes and outcomes in patient-focused and evidence-based cancer care. [1] The fundamental goals of cancer informatics are: (1) to organize data in a way that is comprehensible and meaningful to clinicians, researchers, and patients; (2) to use data to advance cancer care and treatment; and (3) to yield new insights through data analysis. [2]

The multidisciplinary field of cancer informatics includes oncology, pathology, radiology, computational biology, physical chemistry, computer science, information systems, information management, biostatistics, clinical informatics, bioinformatics, imaging informatics, machine learning (ML), artificial intelligence (AI), data mining, data compliance, and many other disciplines. The integration and intersection of these individual disciplines bridge the gap between these individual cancer-related fields and promote cancer research and clinical practice.

From the point of view of informatics, methods and tools enhance the classification, accessibility, and applications of oncology data, thereby transforming cancer treatment into better outcomes. For example, with the development of clinical and imaging oncology databases, radiomics and AI have flourished, providing clinicians with a technological foundation for the early detection and treatment of cancer. In clinical practice, radiologists are under tremendous pressure as the number of cancer patients increases quickly. Studies in AI radiotherapy aim to make radiotherapy easier and faster and turn this labor-intensive procedure into a technology-intensive task. Another example is the multi-omics analysis of precision oncology. Multi-omics analyses can effectively overcome the limitations of single omics by integrating the analysis of a large amount of biological data at the molecular level in different dimensions, such as the genome, epigenome, transcriptome, proteome, metabolome, and microbiome. Moreover, it provides multi-level analyses and interpretations of complex life phenomena with many influencing factors, such as processes and diseases. With the popularization of next-generation high-throughput technologies and the accumulation of large amounts of multi-omics data, integration and fusion analysis for precise diagnosis and treatment of cancer has become an emerging trend.

To summarize the current progress in informatics methods and tools to enhance cancer research and improve cancer clinical practices, we reviewed the most common recent scenarios of informatics-supported applications. A graphic abstract summarizing the field of cancer informatics is depicted in Figure 1.


Fig1 Hong CancerInnov22 1-1.png

Figure 1. A summary of the main points of cancer informatics. AI = artificial intelligence.

Informatics-supported applications of cancer research and clinical practices

Informatics-based publications are available from the National Library of Medicine database (PubMed) and officially released web resources, which include cancer databases, cancer knowledge organization systems, cancer omics, and precision medicine, as well as AI-supported cancer imaging and radiotherapy. In this review, retrieved articles were manually screened according to a criterion containing the following items: aim of the study, methods, results, and clinical scenarios.

Databases and data standards for oncology

Healthcare data stored in various electronic systems follow different formats, whether structured or unstructured data. The information contained in medical records contains critical elements that support cancer therapies. Storing, extracting, and encoding such information plays an important role in cancer treatment and research. Population-based cancer registry databases can record information on incidence, mortality, and treatment outcomes, generating annual statistics as a result. [3] In contrast, hospital-based cancer databases provide more clinical information than population-based cancer registries, such as patient information, clinicopathological information, genomic data, disease staging, treatment, follow-up, lab test results, and medical records, which supports clinical research and improves the care of cancer patients. [3, 4] Furthermore, a consistent system of coding needs to be ensured to integrate the collected data from different sources that could be encoded in various terminological standards. [5] In addition, ontology—as an integration of knowledge, annotation, and concepts—plays an important role in cancer treatment and research.

Cancer databases and scientific programs

The database built by the National Cancer Institute's (NCI's) Surveillance, Epidemiology, and End Results (SEER) program in 1973, and by the Centers for Disease Control and Prevention's (CDC's) National Program of Cancer Registries of the United States in 1995, is used to construct the US Cancer Statistics (USCS) [6, 7], while data from the National Central Cancer Registry of China is used to produce cancer statistics in China. [8, 9] The National Cancer Database (NCDB) of the United States is one of the largest cancer clinical registry databases, with over 34 million data sets of commonly diagnosed solid tumors added since 1989, and it has an increasing number of published studies. [10, 11] Moreover, thousands of new genomes have been sequenced over the past few years. [12] The Cancer Genome Atlas (TCGA) was initiated in 2006 and has characterized more than 20,000 primary cancers at the molecular level, covering 33 cancer types to date. This database consists of genomic, expression, methylation, copy number variation, epigenomic, transcriptomic, and proteomic data amounting to more than 2.5 petabytes in volume. [13, 14] The International Cancer Genome Consortium (ICGC) supports genomic studies in more than 50 cancer types involving more than 25,000 cancer genomes at the genomic, epigenomic, and transcriptomic levels. [15]

Cancer classification, terminology, and ontology

Cancer classification is the prime issue during patient treatment. The International Classification of Diseases for Oncology (ICD-O) published by the World Health Organization (WHO) is widely implemented for tumor disease classification. ICD-O uses a multi-axial coding system to classify the anatomical site and the histology of a tumor. The first, second, and third editions of ICD-O were published in 1976, 1991, and 2000, respectively. [16-18] Furthermore, the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) uses concepts, descriptions, and relationships to build terminology systems that can map and link to other standards. [19, 20] It is used to encode cancer pathological checklists that aim to provide interoperable and portable diagnostic, prognostic, and predictive elements. [21, 22] The NCI has published a comprehensive logic-based terminology, the National Cancer Institute Thesaurus (NCIt), covering cancer-related components such as clinical findings, drugs, treatments, anatomy, genes, proteins, and molecular information. [23] Adverse events (AEs), a critical element in cancer clinical trials and research, are recorded in dictionaries such as the Common Terminology Criteria for Adverse Events (CTCAE) and the Medical Dictionary for Regulatory Activities (MedDRA) developed by the NCI and the International Conference on Harmonization (ICH), respectively. [24, 25] HemOnc.org—which contains information on drugs and regimens regarding their mechanism, U.S. Food and Drug Administration (FDA) approval, common usage, and synonyms—is published and maintained to meet the growing number of chemotherapeutic regimens by combining various definitions, such as RxNorm, SNOMED CT, and the NCIt. [19, 26, 27]

Cancer Care Treatment Outcome Ontology (CCTOO) describes treatment or trial endpoints for patients with solid tumors in four domains, 13 subgroups, and two concept hierarchical structures, with a total of 1,133 terms. [28] Alternatively, TNM-Ontology (TNM-O) consists of four parts: a representation of the primary tumor (T), a representation of regional lymph nodes (N), a representation of distant metastases (M), and the anatomical location of the tumor. It sets different T, N, and M code descriptors for tumors at different anatomical locations. TNM-O was implemented in a colorectal cancer database and achieved a 100% concordance rate after validation by experienced pathologists. [29] The Radiation Oncology Ontology (ROO) was published using Semantic Web technologies, forming a hierarchical structure containing 1,183 classes and 211 properties between classes [30], while the Radiation Oncology Structures (ROS) ontology was developed using a taxonomic hierarchy consisting of 417 classes, each with a number of subclasses, 81% of which can be mapped to the Unified Medical Language System (UMLS). [31] Cancer Cell Ontology (CCL) was published to represent cancer cell types via immune phenotypes in the field of hematological malignancies, with a total number of 6,900 classes (over 300 new classes added). [32] Prostate Cancer Ontology (PCO) represents integrated information from multiple prostate databases using a nine-level hierarchical structure, with 412 concepts [33] and local terminologies, such as the Cervical Cancer Common Terminology [34], which are used for supporting semantic interoperability and utilization of local clinical data.

AI-supported image processing and radiotherapy

Medical imaging is a useful and important modality for cancer detection, progression monitoring, and prognosis prediction. Radiomics and radiotherapy are the two most focused medical research and application areas advanced by AI. Radiomics refers to converting images into structured, mineable data. [35] Most AI-supported image applications focus on early screening and diagnosis using ML methods based on predefined features extracted from medical images. [36] Radiation therapy is a pivotal cancer treatment that has significantly progressed over the last decade due to numerous technological breakthroughs. Traditional radiation therapy workflows identify areas that would benefit from AI, including imaging, treatment planning, quality assurance, and outcome prediction. Many recent studies have shown that the adoption of radiomics and ML has paved the way for improved management of radiation therapy patients.

AI imaging and diagnostics

AI has contributed to medical imaging by improving the quality of images and computer-aided image interpretation and radiomics in most oncology-related diagnoses, and the application of AI is crucial in radiology for various modalities with improved quality, such as X-rays, ultrasounds, computed tomography, magnetic resonance imaging (MRI), positron emission tomography (PET), and digital pathology. To analyze these quantitative data, data images, predictive models, diagnosis, prognosis, and longitudinal monitoring based on a parsimonious set of informative imaging features are yielded. Images are analyzed with highly specialized algorithms with increased speed and accuracy.

According to a number of papers published in recent years, the most common cancer locations are the breast, kidney, brain, lung, prostate, cervix, and liver. The main AI algorithms are convolutional neural networks (CNNs), neural networks (NNs), support vector machines (SVMs), Deep Neural Networks (DNN), and Ensemble learning techniques [37]. A recent study outlined the development and validation of an automated detection system for chest radiography with algorithms based on deep learning [38]. This automated system is designed to diagnose common thoracic diseases including lung malignancies. The results of this study showed that AI-integrated systems have superior image recognition and analysis capabilities compared with human observers. For example, mammography is the first line of imaging screening for breast cancer. For younger women with dense breast tissue, ultrasound is the preferred option, and a previous study demonstrated the influence of AI in breast imaging [39]. The authors compared the interpretation of mammography with and without the assistance of AI. Unsurprisingly, radiologists with AI assistance were able to analyze mammography images quicker and more accurately, which is vital for the rapid detection of cancers, and further research directions for AI in medical imaging will focus on improving speed and reducing costs [40, 41]. Previous studies have also reported AI tools developed by Google that can search for morphologically similar features [41], regardless of annotation status. For example, LYmph Node Assistant (LYNA) is Google-developed deep learning algorithm that can successfully detect metastatic breast cancer on slides with up to 99% accuracy.


Abbreviations, acronyms, and initialisms

  • AE: adverse event
  • AI: artificial intelligence
  • CBCT: cone-beam computed tomography
  • CCTOO: The Cancer Care Treatment Outcome Ontology
  • CDC: Centers for Disease Control and Prevention
  • cGAN: conditional Generative Adversarial Network
  • CNN: convolutional neural network
  • CT: computed tomography
  • CTCAE: Common Terminology Criteria for Adverse Events
  • DCDB: National Cancer Database
  • DIR: deformable image registration
  • EMR: electronic medical records
  • EPID: electronic portal imaging devices
  • ESR1: estrogen receptor 1
  • ICD-O: International Classification of Diseases for Oncology
  • ICGC: International Cancer Genome Consortium
  • ICH: International Conference on Harmonization
  • LYNA: LYmph Node Assistant
  • MedDRA: Medical Dictionary for Regulatory Activities
  • ML: machine learning
  • MRI: magnetic resonance imaging
  • MS: mass spectrometry
  • NCCR: National Central Cancer Registry
  • NCDB: National Cancer Database
  • NCI: National Cancer Institute
  • NCIT: National Cancer Institute Thesaurus
  • NGS: next-generation sequencing
  • NN: neural network
  • NPCR: National Program of Cancer Registries
  • PCO: Prostate Cancer Ontology
  • PET: positron-emission tomography
  • ROO: Radiation Oncology Ontology
  • ROS: Radiation Oncology Structures
  • sCT: synthetic computed tomography
  • SEER: Surveillance, Epidemiology, and End Results
  • SNOMED CT: Systematized Nomenclature of Medicine Clinical Terms
  • SVM: support vector machine
  • TCGA: The Cancer Genome Atlas
  • TNM-O: TNM-Ontology
  • UMLS: Unified Medical Language System
  • USCS: US Cancer Statistics
  • WHO: World Health Organization

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.