Difference between revisions of "Journal:Application of informatics in cancer research and clinical practice: Opportunities and challenges"

From LIMSWiki
Jump to navigationJump to search
(Message box)
(Saving and adding more.)
Line 74: Line 74:
AI has contributed to medical imaging by improving the quality of images and [[Computer-aided diagnosis|computer-aided image interpretation]] and radiomics in most oncology-related diagnoses, and the application of AI is crucial in radiology for various modalities with improved quality, such as X-rays, ultrasounds, computed tomography, [[magnetic resonance imaging]] (MRI), [[positron emission tomography]] (PET), and [[digital pathology]]. To analyze these quantitative data, data images, predictive models, diagnosis, prognosis, and longitudinal monitoring based on a parsimonious set of informative imaging features are yielded. Images are analyzed with highly specialized algorithms with increased speed and accuracy.  
AI has contributed to medical imaging by improving the quality of images and [[Computer-aided diagnosis|computer-aided image interpretation]] and radiomics in most oncology-related diagnoses, and the application of AI is crucial in radiology for various modalities with improved quality, such as X-rays, ultrasounds, computed tomography, [[magnetic resonance imaging]] (MRI), [[positron emission tomography]] (PET), and [[digital pathology]]. To analyze these quantitative data, data images, predictive models, diagnosis, prognosis, and longitudinal monitoring based on a parsimonious set of informative imaging features are yielded. Images are analyzed with highly specialized algorithms with increased speed and accuracy.  


According to a number of papers published in recent years, the most common cancer locations are the breast, kidney, brain, lung, prostate, cervix, and liver. The main AI algorithms are convolutional neural networks (CNNs), neural networks (NNs), support vector machines (SVMs), Deep Neural Networks (DNN), and Ensemble learning techniques [37]. A recent study outlined the development and validation of an automated detection system for chest radiography with algorithms based on deep learning [38]. This automated system is designed to diagnose common thoracic diseases including lung malignancies. The results of this study showed that AI-integrated systems have superior image recognition and analysis capabilities compared with human observers. For example, mammography is the first line of imaging screening for breast cancer. For younger women with dense breast tissue, ultrasound is the preferred option, and a previous study demonstrated the influence of AI in breast imaging [39]. The authors compared the interpretation of mammography with and without the assistance of AI. Unsurprisingly, radiologists with AI assistance were able to analyze mammography images quicker and more accurately, which is vital for the rapid detection of cancers, and further research directions for AI in medical imaging will focus on improving speed and reducing costs [40, 41]. Previous studies have also reported AI tools developed by Google that can search for morphologically similar features [41], regardless of annotation status. For example, LYmph Node Assistant (LYNA) is Google-developed deep learning algorithm that can successfully detect metastatic breast cancer on slides with up to 99% accuracy.
According to a number of papers published in recent years, the most common cancer locations are the breast, kidney, brain, lung, prostate, cervix, and liver. The main AI algorithms are convolutional neural networks (CNNs), neural networks (NNs), support vector machines (SVMs), deep neural networks (DNNs), and ensemble learning techniques. [37] A recent study outlined the development and validation of an automated detection system for chest radiography with algorithms based on deep learning. [38] This automated system is designed to diagnose common thoracic diseases, including lung malignancies. The results of this study showed that AI-integrated systems have superior image recognition and analysis capabilities compared with human observers. For example, mammography is the first line of imaging screening for breast cancer. For younger women with dense breast tissue, ultrasound is the preferred option, and a previous study demonstrated the influence of AI in breast imaging. [39] The authors compared the interpretation of mammography with and without the assistance of AI. Unsurprisingly, radiologists with AI assistance were able to analyze mammography images quicker and more accurately, which is vital for the rapid detection of cancers, and further research directions for AI in medical imaging will focus on improving speed and reducing costs. [40, 41] Previous studies have also reported AI tools developed by Google that can search for morphologically similar features [41], regardless of annotation status. For example, LYmph Node Assistant (LYNA) is a Google-developed deep learning algorithm that can successfully detect metastatic breast cancer on slides with up to 99% accuracy.
 
====AI-supported radiotherapy====
In radiotherapy, images from different patients, times, or modalities often need to be registered to synthesize their corresponding information in a joint coordinate. The registration of images is relatively simple. However, how to achieve the registration of images and pathology (biomarkers) obtained or analyzed by different modalities is a current problem. At present, the prediction of biomarkers according to images does not achieve accurate point-to-point matching. A study was conducted to set up the conditional Generative Adversarial Network (cGAN), which uses synthetic computed tomography (sCT) images from low field MR images in the pelvis and abdomen, and compares the differences in dose-volume histograms between sCT and original CT [42]. Deep learning has been used to improve the quality and efficiency of deformable image registration (DIR) [43]. Given the unavoidable nonrigid anatomical motion by the patient between image acquisitions, DIR needs to establish a voxel-to-voxel correspondence between two medical images that reflects these two different anatomical instances [44, 45]. In addition, treatment planning benefits from AI and information technologies. An array of research with dose prediction or validation has been published in recent years. Multiple dose levels, radiation-sensitive critical structures near target organs, and tumors in the abdomen, head, and neck were the most researched areas among recent achievements [46]. To enable accurate MRI-based dose calculations, Matteo et al. generated sCT from T1-weighted MRI using three 2D conditional cGANs [47]. Furthermore, new devices, such as electronic portal imaging devices [48] and kV cone-beam computed tomography images [49], have reconstructed the 3D dose distribution in radiotherapy treatment. AI also supports radiotherapy outcome prediction, a dual-input channel hybrid deep learning model that efficiently integrates an entire set of dosimetric parameters for radiation treatment planning, which was developed to enhance the prediction of Grade 4 radiotherapy-induced lymphopenia. [50]
 
===Cancer multi-omics research===
Unlike evidence-based medicine, studies on precision oncology should be data-driven, and omics data are among the most critical. Omics is a type of biotechnology that analyzes the structure and function of the overall composition of a given biological function at different levels. With the development of high-throughput technologies, such as [[next-generation sequencing]] (NGS) and [[mass spectrometry]] (MS)-based techniques such as [[liquid chromatography]]–[[tandem mass spectrometry]] (LC-MS/MS), it is possible to facilitate the investigation of the genome, transcriptome, proteome, and metabolome. Compared with single-level omics, multi-omic approaches can reveal the molecular mechanisms underlying different phenotypic manifestations of cancer from multiple dimensions. Thus, multi-omics has been proposed as the key to precision oncology in clinical practice. Together, these omics data can help to reveal the complex molecular mechanisms in different diseases. [51] Multi-omics can generate more information, and how to achieve multi-omics registration deserves further research.
 
====Genomics, proteomics, metabolomics, and microbiomics in cancer research====
Scientists have identified several mutated cancer genes through DNA sequencing techniques, such as ''PIK3CA'', ''EGFR'', and ''HER2''. [52-54] In recent years, the application of NGS for DNA sequencing, coupled with analytical methods, has enabled unprecedented speed and precision in decoding human genomes. [55] In addition, NGS techniques have dramatically reduced the cost of sequencing. Massively parallel sequencing allows further insights into cancer disease from various aspects, including diagnosis, classification, therapeutics, and risk prediction. [56] In addition to differences in gene expression, a study has suggested that DNA methylation, a reversible DNA modification, can be used as an indicator of cancer status. [57] The identification of DNA modifications—including methylation, acetylation, histone modification, and nucleosome remodeling—is defined as epigenomics. These modifications are critical in regulating the biological processes fundamental to cancer genesis. [58] Several factors such as genetic and environmental factors can affect DNA modifications, which might be long-lasting or even heritable. [59-61] Hence, epigenomics data has great potential in the interpretation of genetic variants in cancer. Compared with DNA, RNA molecules change temporally according to cellular, environmental, extracellular, and developmental stimulation. The application of NGS has also facilitated transcriptomics studies because we can identify both the presence and abundance of RNA transcripts in a genome-wide manner via RNA-sequencing. [62] Studies on transcriptomics have revealed characteristic gene expression signatures in various cancer types that can help in clinical decisions, including diagnosis, treatment choices, and disease management. Furthermore, several clinical trial findings have been applied to predict the prognosis of different cancers, such as breast and lung cancer. [63, 64] Gene expression sequencing has also been extended to single cells, which enriches the data of cancer cells and helps us to understand cancer heterogeneity. [65, 66]
 
In cancer research, proteomics data has contributed to the development of biomarkers in cancer identification, as well as classification, prediction of drug sensitivity, and identification of proteins that may mediate drug resistance in different cancer types. [67-69] The development of LC-MS/MS techniques has provided a platform for proteomic analysis, for example, supporting proteomic alterations in various cancer tissues. The application of LC-MS/MS can be extended to small molecules, which allows us to study metabolomics data. Compared with the omics mentioned above, metabolomics is a new field, and most studies of cancer metabolomics have focused on the identification of biomarkers in plasma or serum samples, such as unsaturated free fatty acids in colorectal cancer and citrate changes in prostate cancer. [70, 71] Furthermore, microbiomics data give us brand new insights into cancer research and provide further information on the underlying molecular mechanisms in cancer genesis and development. It is suggested that the dysbiosis of symbiotic microbiota is related to several types of cancer. [72] In addition to cancer triggering or promotion, the microbiome can also be used in cancer therapies, including therapeutic targets and microbiota transplantation. [72, 73]
 
====Integrated multi-omics analysis for precision oncology====
The integration and analysis of high-throughput omics data are complex but critical. Data-driven methods include deep learning, network-based methods, clustering, features extraction, transformation, and factorization, which connect the data and clinical and molecular features of cancer. [74] Furthermore, multi-omics studies on cancer cover many goals, including biomarker discovery, subgroup identification, molecular pathway analysis, and drug repurposing/discovery. Table 1 summarizes some multi-omics studies conducted on cancer in recent years. These findings have contributed to precision oncology in clinical decision-making and mechanism studies.
 
{|
| style="vertical-align:top;" |
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="70%"
|-
  | colspan="8" style="background-color:white; padding-left:10px; padding-right:10px;" |'''Table 1.''' Examples of multi-omics studies in cancer
|-
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" rowspan="2" |Article
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" rowspan="2" |Objective
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" rowspan="2" |Cohort/database
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" colspan="5" |Omics data
|-
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" |Genomics
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" |Epigemomics
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" |Trascriptomics
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" |Proteomics
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;" |Metabolomics
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Chaudhary ''et al.'' [75]
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Survival prediction
  | style="background-color:white; padding-left:10px; padding-right:10px;" |TCGA + multiple center
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
|- 
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Zhang ''et al.'' [76]
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Analysis of tumor heterogeneity
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Single center
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
|- 
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Seal ''et al.'' [77]
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Estimating gene expression
  | style="background-color:white; padding-left:10px; padding-right:10px;" |TCGA
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
|- 
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Ouyang ''et al.'' [78]
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Biomarker identification and subtyping
  | style="background-color:white; padding-left:10px; padding-right:10px;" |TCGA + GEO
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
|- 
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Huang ''et al.'' [79]
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Establishing a model for survival prediction
  | style="background-color:white; padding-left:10px; padding-right:10px;" |TCGA
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
|- 
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Löffler ''et al.'' [80]
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Validation of therapeutic target
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Single center
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
|- 
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Shen ''et al.'' [81]
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Analysis of molecular pathways in HCC cell
  | style="background-color:white; padding-left:10px; padding-right:10px;" |HepG2 cell line
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
  | style="background-color:white; padding-left:10px; padding-right:10px;" |☑
|-
|}
|}
 
====Biomarker identification for cancer prevention, diagnosis, and prognosis====
Molecular biomarkers identified from omics data are often used for cancer prevention and diagnostics by detecting early disease. Cancer surveillance can be improved by identifying clinically relevant biomarkers for the early prevention of disease and to predict prognosis for effective treatment, such as carcinoembryonic antigen to monitor the recurrence of colorectal cancer [82, 83] and mutations in estrogen receptor 1 (ESR1) to predict prognosis and treatment outcomes in breast cancer. [84] Furthermore, shallow sequencing has recently been applied to the whole genome for diagnostics in breast cancer [85], lung cancer [86], and neuroblastoma. [87]
 
==Challenges==
 
 




Line 89: Line 193:
* '''DCDB''': National Cancer Database
* '''DCDB''': National Cancer Database
* '''DIR''': deformable image registration
* '''DIR''': deformable image registration
* '''EMR''': electronic medical records
* '''DNN''': deep neural network
* '''EPID''': electronic portal imaging devices
* '''EMR''': electronic medical record
* '''EPID''': electronic portal imaging device
* '''ESR1''': estrogen receptor 1
* '''ESR1''': estrogen receptor 1
* '''ICD-O''': International Classification of Diseases for Oncology
* '''ICD-O''': International Classification of Diseases for Oncology

Revision as of 23:33, 5 December 2022

Full article title Application of informatics in cancer research and clinical practice: Opportunities and challenges
Journal Cancer Innovation
Author(s) Hong, Na; Sun, Gang; Zuo, Xiuran; Chen, Meng; Liu, Li; Jiani, Wang; Feng, Xiaobin; Shi, Wenzhao; Gong, Mengchun; Ma, Pengcheng
Author affiliation(s) Digital Health China Technologies Co., Xinjiang Cancer Center, Huazhong University of Science and Technology, Southern Medical University, Chinese Academy of Medical Sciences and Peking Union Medical College, Tsinghua University
Primary contact Email: gmc at nrdrs dot org
Year published 2022
Volume and issue 1(1)
Page(s) 80–91
DOI 10.1002/cai2.9
ISSN 2770-9183
Distribution license Creative Commons Attribution 4.0 International
Website https://onlinelibrary.wiley.com/doi/10.1002/cai2.9
Download https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/cai2.9 (PDF)

Abstract

Cancer informatics has significantly progressed in the big data era. We summarize the application of informatics approaches to the cancer domain from both the informatics perspective (e.g., data management and data science) and the clinical perspective (e.g., cancer screening, risk assessment, diagnosis, treatment, and prognosis). We discuss various informatics methods and tools that are widely applied in cancer research and practices, such as cancer databases, data standards, terminologies, high-throughput omics data mining, machine learning algorithms, artificial intelligence imaging, and intelligent radiation. We also address the informatics challenges within the cancer field that pursue better treatment decisions and patient outcomes, and focus on how informatics can provide opportunities for cancer research and practices. Finally, we conclude that the interdisciplinary nature of cancer informatics and collaborations are major drivers for future research and applications in clinical practices. It is hoped that this review is instrumental for cancer researchers and clinicians with its informatics-specific insights.

Keywords: artificial intelligence application, cancer informatics, machine learning

Introduction

Advances in information science and technology have brought significant benefits to cancer research and care, including larger study cohorts, more complete follow-up, more effective clinician teams, lower costs, increased patient life expectancy, and improved quality of life. Despite all cancer-related aspects, such as diagnosis, prognosis, and treatment being significantly improved, this disease area remains one of the most significant challenges in medical science due to disease heterogeneity and the need to identify underlying biomarkers that are potentially linked to specific cancer types.

Cancer informatics is a branch of medical informatics that applies information science, computer science, data science, and information technologies to the field of oncology. This is an area that deals with the resources, devices, and methods required to optimize the acquisition, storage, retrieval, and use of information in cancer. Applied cancer informatics transforms clinical data into meaningful and useful information to improve processes and outcomes in patient-focused and evidence-based cancer care. [1] The fundamental goals of cancer informatics are: (1) to organize data in a way that is comprehensible and meaningful to clinicians, researchers, and patients; (2) to use data to advance cancer care and treatment; and (3) to yield new insights through data analysis. [2]

The multidisciplinary field of cancer informatics includes oncology, pathology, radiology, computational biology, physical chemistry, computer science, information systems, information management, biostatistics, clinical informatics, bioinformatics, imaging informatics, machine learning (ML), artificial intelligence (AI), data mining, data compliance, and many other disciplines. The integration and intersection of these individual disciplines bridge the gap between these individual cancer-related fields and promote cancer research and clinical practice.

From the point of view of informatics, methods and tools enhance the classification, accessibility, and applications of oncology data, thereby transforming cancer treatment into better outcomes. For example, with the development of clinical and imaging oncology databases, radiomics and AI have flourished, providing clinicians with a technological foundation for the early detection and treatment of cancer. In clinical practice, radiologists are under tremendous pressure as the number of cancer patients increases quickly. Studies in AI radiotherapy aim to make radiotherapy easier and faster and turn this labor-intensive procedure into a technology-intensive task. Another example is the multi-omics analysis of precision oncology. Multi-omics analyses can effectively overcome the limitations of single omics by integrating the analysis of a large amount of biological data at the molecular level in different dimensions, such as the genome, epigenome, transcriptome, proteome, metabolome, and microbiome. Moreover, it provides multi-level analyses and interpretations of complex life phenomena with many influencing factors, such as processes and diseases. With the popularization of next-generation high-throughput technologies and the accumulation of large amounts of multi-omics data, integration and fusion analysis for precise diagnosis and treatment of cancer has become an emerging trend.

To summarize the current progress in informatics methods and tools to enhance cancer research and improve cancer clinical practices, we reviewed the most common recent scenarios of informatics-supported applications. A graphic abstract summarizing the field of cancer informatics is depicted in Figure 1.


Fig1 Hong CancerInnov22 1-1.png

Figure 1. A summary of the main points of cancer informatics. AI = artificial intelligence.

Informatics-supported applications of cancer research and clinical practices

Informatics-based publications are available from the National Library of Medicine database (PubMed) and officially released web resources, which include cancer databases, cancer knowledge organization systems, cancer omics, and precision medicine, as well as AI-supported cancer imaging and radiotherapy. In this review, retrieved articles were manually screened according to a criterion containing the following items: aim of the study, methods, results, and clinical scenarios.

Databases and data standards for oncology

Healthcare data stored in various electronic systems follow different formats, whether structured or unstructured data. The information contained in medical records contains critical elements that support cancer therapies. Storing, extracting, and encoding such information plays an important role in cancer treatment and research. Population-based cancer registry databases can record information on incidence, mortality, and treatment outcomes, generating annual statistics as a result. [3] In contrast, hospital-based cancer databases provide more clinical information than population-based cancer registries, such as patient information, clinicopathological information, genomic data, disease staging, treatment, follow-up, lab test results, and medical records, which supports clinical research and improves the care of cancer patients. [3, 4] Furthermore, a consistent system of coding needs to be ensured to integrate the collected data from different sources that could be encoded in various terminological standards. [5] In addition, ontology—as an integration of knowledge, annotation, and concepts—plays an important role in cancer treatment and research.

Cancer databases and scientific programs

The database built by the National Cancer Institute's (NCI's) Surveillance, Epidemiology, and End Results (SEER) program in 1973, and by the Centers for Disease Control and Prevention's (CDC's) National Program of Cancer Registries of the United States in 1995, is used to construct the US Cancer Statistics (USCS) [6, 7], while data from the National Central Cancer Registry of China is used to produce cancer statistics in China. [8, 9] The National Cancer Database (NCDB) of the United States is one of the largest cancer clinical registry databases, with over 34 million data sets of commonly diagnosed solid tumors added since 1989, and it has an increasing number of published studies. [10, 11] Moreover, thousands of new genomes have been sequenced over the past few years. [12] The Cancer Genome Atlas (TCGA) was initiated in 2006 and has characterized more than 20,000 primary cancers at the molecular level, covering 33 cancer types to date. This database consists of genomic, expression, methylation, copy number variation, epigenomic, transcriptomic, and proteomic data amounting to more than 2.5 petabytes in volume. [13, 14] The International Cancer Genome Consortium (ICGC) supports genomic studies in more than 50 cancer types involving more than 25,000 cancer genomes at the genomic, epigenomic, and transcriptomic levels. [15]

Cancer classification, terminology, and ontology

Cancer classification is the prime issue during patient treatment. The International Classification of Diseases for Oncology (ICD-O) published by the World Health Organization (WHO) is widely implemented for tumor disease classification. ICD-O uses a multi-axial coding system to classify the anatomical site and the histology of a tumor. The first, second, and third editions of ICD-O were published in 1976, 1991, and 2000, respectively. [16-18] Furthermore, the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) uses concepts, descriptions, and relationships to build terminology systems that can map and link to other standards. [19, 20] It is used to encode cancer pathological checklists that aim to provide interoperable and portable diagnostic, prognostic, and predictive elements. [21, 22] The NCI has published a comprehensive logic-based terminology, the National Cancer Institute Thesaurus (NCIt), covering cancer-related components such as clinical findings, drugs, treatments, anatomy, genes, proteins, and molecular information. [23] Adverse events (AEs), a critical element in cancer clinical trials and research, are recorded in dictionaries such as the Common Terminology Criteria for Adverse Events (CTCAE) and the Medical Dictionary for Regulatory Activities (MedDRA) developed by the NCI and the International Conference on Harmonization (ICH), respectively. [24, 25] HemOnc.org—which contains information on drugs and regimens regarding their mechanism, U.S. Food and Drug Administration (FDA) approval, common usage, and synonyms—is published and maintained to meet the growing number of chemotherapeutic regimens by combining various definitions, such as RxNorm, SNOMED CT, and the NCIt. [19, 26, 27]

Cancer Care Treatment Outcome Ontology (CCTOO) describes treatment or trial endpoints for patients with solid tumors in four domains, 13 subgroups, and two concept hierarchical structures, with a total of 1,133 terms. [28] Alternatively, TNM-Ontology (TNM-O) consists of four parts: a representation of the primary tumor (T), a representation of regional lymph nodes (N), a representation of distant metastases (M), and the anatomical location of the tumor. It sets different T, N, and M code descriptors for tumors at different anatomical locations. TNM-O was implemented in a colorectal cancer database and achieved a 100% concordance rate after validation by experienced pathologists. [29] The Radiation Oncology Ontology (ROO) was published using Semantic Web technologies, forming a hierarchical structure containing 1,183 classes and 211 properties between classes [30], while the Radiation Oncology Structures (ROS) ontology was developed using a taxonomic hierarchy consisting of 417 classes, each with a number of subclasses, 81% of which can be mapped to the Unified Medical Language System (UMLS). [31] Cancer Cell Ontology (CCL) was published to represent cancer cell types via immune phenotypes in the field of hematological malignancies, with a total number of 6,900 classes (over 300 new classes added). [32] Prostate Cancer Ontology (PCO) represents integrated information from multiple prostate databases using a nine-level hierarchical structure, with 412 concepts [33] and local terminologies, such as the Cervical Cancer Common Terminology [34], which are used for supporting semantic interoperability and utilization of local clinical data.

AI-supported image processing and radiotherapy

Medical imaging is a useful and important modality for cancer detection, progression monitoring, and prognosis prediction. Radiomics and radiotherapy are the two most focused medical research and application areas advanced by AI. Radiomics refers to converting images into structured, mineable data. [35] Most AI-supported image applications focus on early screening and diagnosis using ML methods based on predefined features extracted from medical images. [36] Radiation therapy is a pivotal cancer treatment that has significantly progressed over the last decade due to numerous technological breakthroughs. Traditional radiation therapy workflows identify areas that would benefit from AI, including imaging, treatment planning, quality assurance, and outcome prediction. Many recent studies have shown that the adoption of radiomics and ML has paved the way for improved management of radiation therapy patients.

AI imaging and diagnostics

AI has contributed to medical imaging by improving the quality of images and computer-aided image interpretation and radiomics in most oncology-related diagnoses, and the application of AI is crucial in radiology for various modalities with improved quality, such as X-rays, ultrasounds, computed tomography, magnetic resonance imaging (MRI), positron emission tomography (PET), and digital pathology. To analyze these quantitative data, data images, predictive models, diagnosis, prognosis, and longitudinal monitoring based on a parsimonious set of informative imaging features are yielded. Images are analyzed with highly specialized algorithms with increased speed and accuracy.

According to a number of papers published in recent years, the most common cancer locations are the breast, kidney, brain, lung, prostate, cervix, and liver. The main AI algorithms are convolutional neural networks (CNNs), neural networks (NNs), support vector machines (SVMs), deep neural networks (DNNs), and ensemble learning techniques. [37] A recent study outlined the development and validation of an automated detection system for chest radiography with algorithms based on deep learning. [38] This automated system is designed to diagnose common thoracic diseases, including lung malignancies. The results of this study showed that AI-integrated systems have superior image recognition and analysis capabilities compared with human observers. For example, mammography is the first line of imaging screening for breast cancer. For younger women with dense breast tissue, ultrasound is the preferred option, and a previous study demonstrated the influence of AI in breast imaging. [39] The authors compared the interpretation of mammography with and without the assistance of AI. Unsurprisingly, radiologists with AI assistance were able to analyze mammography images quicker and more accurately, which is vital for the rapid detection of cancers, and further research directions for AI in medical imaging will focus on improving speed and reducing costs. [40, 41] Previous studies have also reported AI tools developed by Google that can search for morphologically similar features [41], regardless of annotation status. For example, LYmph Node Assistant (LYNA) is a Google-developed deep learning algorithm that can successfully detect metastatic breast cancer on slides with up to 99% accuracy.

AI-supported radiotherapy

In radiotherapy, images from different patients, times, or modalities often need to be registered to synthesize their corresponding information in a joint coordinate. The registration of images is relatively simple. However, how to achieve the registration of images and pathology (biomarkers) obtained or analyzed by different modalities is a current problem. At present, the prediction of biomarkers according to images does not achieve accurate point-to-point matching. A study was conducted to set up the conditional Generative Adversarial Network (cGAN), which uses synthetic computed tomography (sCT) images from low field MR images in the pelvis and abdomen, and compares the differences in dose-volume histograms between sCT and original CT [42]. Deep learning has been used to improve the quality and efficiency of deformable image registration (DIR) [43]. Given the unavoidable nonrigid anatomical motion by the patient between image acquisitions, DIR needs to establish a voxel-to-voxel correspondence between two medical images that reflects these two different anatomical instances [44, 45]. In addition, treatment planning benefits from AI and information technologies. An array of research with dose prediction or validation has been published in recent years. Multiple dose levels, radiation-sensitive critical structures near target organs, and tumors in the abdomen, head, and neck were the most researched areas among recent achievements [46]. To enable accurate MRI-based dose calculations, Matteo et al. generated sCT from T1-weighted MRI using three 2D conditional cGANs [47]. Furthermore, new devices, such as electronic portal imaging devices [48] and kV cone-beam computed tomography images [49], have reconstructed the 3D dose distribution in radiotherapy treatment. AI also supports radiotherapy outcome prediction, a dual-input channel hybrid deep learning model that efficiently integrates an entire set of dosimetric parameters for radiation treatment planning, which was developed to enhance the prediction of Grade 4 radiotherapy-induced lymphopenia. [50]

Cancer multi-omics research

Unlike evidence-based medicine, studies on precision oncology should be data-driven, and omics data are among the most critical. Omics is a type of biotechnology that analyzes the structure and function of the overall composition of a given biological function at different levels. With the development of high-throughput technologies, such as next-generation sequencing (NGS) and mass spectrometry (MS)-based techniques such as liquid chromatographytandem mass spectrometry (LC-MS/MS), it is possible to facilitate the investigation of the genome, transcriptome, proteome, and metabolome. Compared with single-level omics, multi-omic approaches can reveal the molecular mechanisms underlying different phenotypic manifestations of cancer from multiple dimensions. Thus, multi-omics has been proposed as the key to precision oncology in clinical practice. Together, these omics data can help to reveal the complex molecular mechanisms in different diseases. [51] Multi-omics can generate more information, and how to achieve multi-omics registration deserves further research.

Genomics, proteomics, metabolomics, and microbiomics in cancer research

Scientists have identified several mutated cancer genes through DNA sequencing techniques, such as PIK3CA, EGFR, and HER2. [52-54] In recent years, the application of NGS for DNA sequencing, coupled with analytical methods, has enabled unprecedented speed and precision in decoding human genomes. [55] In addition, NGS techniques have dramatically reduced the cost of sequencing. Massively parallel sequencing allows further insights into cancer disease from various aspects, including diagnosis, classification, therapeutics, and risk prediction. [56] In addition to differences in gene expression, a study has suggested that DNA methylation, a reversible DNA modification, can be used as an indicator of cancer status. [57] The identification of DNA modifications—including methylation, acetylation, histone modification, and nucleosome remodeling—is defined as epigenomics. These modifications are critical in regulating the biological processes fundamental to cancer genesis. [58] Several factors such as genetic and environmental factors can affect DNA modifications, which might be long-lasting or even heritable. [59-61] Hence, epigenomics data has great potential in the interpretation of genetic variants in cancer. Compared with DNA, RNA molecules change temporally according to cellular, environmental, extracellular, and developmental stimulation. The application of NGS has also facilitated transcriptomics studies because we can identify both the presence and abundance of RNA transcripts in a genome-wide manner via RNA-sequencing. [62] Studies on transcriptomics have revealed characteristic gene expression signatures in various cancer types that can help in clinical decisions, including diagnosis, treatment choices, and disease management. Furthermore, several clinical trial findings have been applied to predict the prognosis of different cancers, such as breast and lung cancer. [63, 64] Gene expression sequencing has also been extended to single cells, which enriches the data of cancer cells and helps us to understand cancer heterogeneity. [65, 66]

In cancer research, proteomics data has contributed to the development of biomarkers in cancer identification, as well as classification, prediction of drug sensitivity, and identification of proteins that may mediate drug resistance in different cancer types. [67-69] The development of LC-MS/MS techniques has provided a platform for proteomic analysis, for example, supporting proteomic alterations in various cancer tissues. The application of LC-MS/MS can be extended to small molecules, which allows us to study metabolomics data. Compared with the omics mentioned above, metabolomics is a new field, and most studies of cancer metabolomics have focused on the identification of biomarkers in plasma or serum samples, such as unsaturated free fatty acids in colorectal cancer and citrate changes in prostate cancer. [70, 71] Furthermore, microbiomics data give us brand new insights into cancer research and provide further information on the underlying molecular mechanisms in cancer genesis and development. It is suggested that the dysbiosis of symbiotic microbiota is related to several types of cancer. [72] In addition to cancer triggering or promotion, the microbiome can also be used in cancer therapies, including therapeutic targets and microbiota transplantation. [72, 73]

Integrated multi-omics analysis for precision oncology

The integration and analysis of high-throughput omics data are complex but critical. Data-driven methods include deep learning, network-based methods, clustering, features extraction, transformation, and factorization, which connect the data and clinical and molecular features of cancer. [74] Furthermore, multi-omics studies on cancer cover many goals, including biomarker discovery, subgroup identification, molecular pathway analysis, and drug repurposing/discovery. Table 1 summarizes some multi-omics studies conducted on cancer in recent years. These findings have contributed to precision oncology in clinical decision-making and mechanism studies.

Table 1. Examples of multi-omics studies in cancer
Article Objective Cohort/database Omics data
Genomics Epigemomics Trascriptomics Proteomics Metabolomics
Chaudhary et al. [75] Survival prediction TCGA + multiple center
Zhang et al. [76] Analysis of tumor heterogeneity Single center
Seal et al. [77] Estimating gene expression TCGA
Ouyang et al. [78] Biomarker identification and subtyping TCGA + GEO
Huang et al. [79] Establishing a model for survival prediction TCGA
Löffler et al. [80] Validation of therapeutic target Single center
Shen et al. [81] Analysis of molecular pathways in HCC cell HepG2 cell line

Biomarker identification for cancer prevention, diagnosis, and prognosis

Molecular biomarkers identified from omics data are often used for cancer prevention and diagnostics by detecting early disease. Cancer surveillance can be improved by identifying clinically relevant biomarkers for the early prevention of disease and to predict prognosis for effective treatment, such as carcinoembryonic antigen to monitor the recurrence of colorectal cancer [82, 83] and mutations in estrogen receptor 1 (ESR1) to predict prognosis and treatment outcomes in breast cancer. [84] Furthermore, shallow sequencing has recently been applied to the whole genome for diagnostics in breast cancer [85], lung cancer [86], and neuroblastoma. [87]

Challenges

Abbreviations, acronyms, and initialisms

  • AE: adverse event
  • AI: artificial intelligence
  • CBCT: cone-beam computed tomography
  • CCTOO: The Cancer Care Treatment Outcome Ontology
  • CDC: Centers for Disease Control and Prevention
  • cGAN: conditional Generative Adversarial Network
  • CNN: convolutional neural network
  • CT: computed tomography
  • CTCAE: Common Terminology Criteria for Adverse Events
  • DCDB: National Cancer Database
  • DIR: deformable image registration
  • DNN: deep neural network
  • EMR: electronic medical record
  • EPID: electronic portal imaging device
  • ESR1: estrogen receptor 1
  • ICD-O: International Classification of Diseases for Oncology
  • ICGC: International Cancer Genome Consortium
  • ICH: International Conference on Harmonization
  • LYNA: LYmph Node Assistant
  • MedDRA: Medical Dictionary for Regulatory Activities
  • ML: machine learning
  • MRI: magnetic resonance imaging
  • MS: mass spectrometry
  • NCCR: National Central Cancer Registry
  • NCDB: National Cancer Database
  • NCI: National Cancer Institute
  • NCIT: National Cancer Institute Thesaurus
  • NGS: next-generation sequencing
  • NN: neural network
  • NPCR: National Program of Cancer Registries
  • PCO: Prostate Cancer Ontology
  • PET: positron-emission tomography
  • ROO: Radiation Oncology Ontology
  • ROS: Radiation Oncology Structures
  • sCT: synthetic computed tomography
  • SEER: Surveillance, Epidemiology, and End Results
  • SNOMED CT: Systematized Nomenclature of Medicine Clinical Terms
  • SVM: support vector machine
  • TCGA: The Cancer Genome Atlas
  • TNM-O: TNM-Ontology
  • UMLS: Unified Medical Language System
  • USCS: US Cancer Statistics
  • WHO: World Health Organization

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.