Difference between revisions of "Journal:Infrastructure tools to support an effective radiation oncology learning health system"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
(Saving and adding more.)
(2 intermediate revisions by the same user not shown)
Line 40: Line 40:
==Background and significance==
==Background and significance==
For the past three decades, there is a growing interest in building [[learning organization]]s to address the most pressing and complex business, social, and economic challenges facing society today. [1] For healthcare, the National Academy of Medicine has defined the concept of a [[Learning health systems|learning health system]] (LHS) as an entity where science, incentive, culture, and [[Informatics (academic field)|informatics]] are aligned for continuous innovation, with new knowledge capture and discovery as an integral part for practicing evidence-based medicine. [2] The current dependency on [[Medical research|randomized controlled clinical trials]] that use a controlled environment for scientific evidence creation with only a small percent (<3%) of patient [[Sample (material)|samples]] is inadequate now and may be irrelevant in the future since these trials take too much time, are too expensive, and are fraught with questions of generalizability. The Agency for Healthcare Research and Quality has also been promoting the development of LHSs as part of a key strategy for healthcare organizations to make transformational changes to improve healthcare [[Quality (business)|quality]] and value. Large-scale healthcare systems are now recognizing the need to build infrastructure capable of [[Continual improvement process|continuous learning and improvement]] in delivering care to patients and address critical population health issues. [3] In an LHS, data collection should be performed from various sources such as [[electronic health record]]s (EHRs), treatment delivery records, [[imaging]] records, patient-generated data records, and administrative and claims data, which then allows for this aggregated data to be analyzed for generating new insights and knowledge that can be used to improve patient care and outcomes.
For the past three decades, there is a growing interest in building [[learning organization]]s to address the most pressing and complex business, social, and economic challenges facing society today. [1] For healthcare, the National Academy of Medicine has defined the concept of a [[Learning health systems|learning health system]] (LHS) as an entity where science, incentive, culture, and [[Informatics (academic field)|informatics]] are aligned for continuous innovation, with new knowledge capture and discovery as an integral part for practicing evidence-based medicine. [2] The current dependency on [[Medical research|randomized controlled clinical trials]] that use a controlled environment for scientific evidence creation with only a small percent (<3%) of patient [[Sample (material)|samples]] is inadequate now and may be irrelevant in the future since these trials take too much time, are too expensive, and are fraught with questions of generalizability. The Agency for Healthcare Research and Quality has also been promoting the development of LHSs as part of a key strategy for healthcare organizations to make transformational changes to improve healthcare [[Quality (business)|quality]] and value. Large-scale healthcare systems are now recognizing the need to build infrastructure capable of [[Continual improvement process|continuous learning and improvement]] in delivering care to patients and address critical population health issues. [3] In an LHS, data collection should be performed from various sources such as [[electronic health record]]s (EHRs), treatment delivery records, [[imaging]] records, patient-generated data records, and administrative and claims data, which then allows for this aggregated data to be analyzed for generating new insights and knowledge that can be used to improve patient care and outcomes.
However, only a few attempts at leveraging existing infrastructure tools used in routine clinical practice to transform the healthcare domain into an LHS have been made. [5, 6] Some examples of actual implementation have emerged, but by and large these concepts have been mostly discussed as conceptual ideas and strategies in the literature. There are several data organization and [[Information management|management]] challenges that must be addressed in order to effectively implement a [[Radiation oncologist|radiation oncology]] LHS:
:1. [[Data integration]]: Radiation oncology data are generated from a variety of sources, including EHRs, imaging systems, treatment planning systems (TPSs), and clinical trials. Integration of this data into a single repository can be challenging due to differences in data formats, terminologies, and storage systems. There is often significant [[Semantics|semantic]] heterogeneity in the way that different clinicians and researchers use terminology to describe radiation oncology data. For example, different institutions may use different codes or terms to describe the same condition or treatment.
:2. Data stored in disparate database schemas: Presently, the EHR, TPS, and treatment management system (TMS) data are housed in a series of [[relational database management system]]s (RDMS), which have rigid database structures, varying data schemas and can include lots of uncoded textual data. Tumor registries also stores data in their own defined schemas. Although the column names in the relational databases between two software products might be the same, semantic meaning based on the application of use may be completely different. Changing a database schema requires a lot of programming effort and code changes because of the rigid structure of the stored data, and it is generally advisable to retire old tables and build new tables with the added column definitions.
:3. [[Linked data|Episodic linking]] of records: Episodic linking of records refers to the process of integrating patient data from multiple encounters or episodes of care into a single comprehensive record. This record includes information about the patient's medical history, diagnosis, treatment plan, and outcomes, which can be used to improve care delivery, research, and education. Linking multiple data sources based on the patients episodic history of care is quite challenging because the heterogeneity of these data sources does not normally follow any common data storage standards.
:4. Build data query tools based on semantic meaning of the data: Since the data are currently stored in multiple RDMSs for the specific purpose to cater the operations aspects of the patient care, extracting common semantic meaning from this data is very challenging. Common semantic meaning in healthcare data is typically achieved through the use of [[Controlled vocabulary|standardized vocabularies]] and ontologies that define concepts and relationships between them. Developing data query tools based on semantic meaning requires a high level of expertise in both the technical and domain-specific aspects of radiation oncology. Moreover, executing complex data queries, which includes tree-based queries, recursive queries, and derived data queries requires multiple tables joining operations in RDMSs, which is a costly operation.
While we are on the cusp of an [[artificial intelligence]] (AI) revolution in [[Biomedical sciences|biomedicine]], with the fast-growing development of advanced [[machine learning]] (ML) methods that can analyze complex datasets, there is an urgent need for a scalable intelligent infrastructure that can support these methods. The radiation oncology domain is also one of the most technically advanced medical specialties, with a long history of electronic data generation (e.g.,  [[Radiation therapy|radiation treatment]] (RT) simulation, treatment planning, etc.) that is modeled for each individual patient. This large volume of patient-specific real-world data captured during routine clinical practice, dosimetry, and treatment delivery make this domain ideally suited for rapid learning. [4] Rapid learning concepts could be applied using an LHS, providing the potential to improve patient outcomes and care delivery, reduce costs, and generate new knowledge from real world clinical and dosimetry data.
Several research groups in radiation oncology, including the University of Michigan, MD Anderson, and Johns Hopkins, have developed data gathering platforms with specific goals. [5] These platforms—such as the M-ROAR platform [6] at the University of Michigan, the system-wide [[electronic data capture]] platform at MD Anderson [7], and the Oncospace program at Johns Hopkins [8]—have been deployed to collect and assess practice patterns, perform outcome analysis, and capture RT-specific data, including dose distributions, organ-at-risk (OAR) information, images, and outcome data. While these platforms serve specific purposes, they rely on relational database-based systems without utilizing standard ontology-based data definitions. However, [[knowledge graph]]-based systems offer significant advantages over these relational database-based systems. Knowledge graph-based systems provide a more integrated and comprehensive representation of data by capturing complex relationships, hierarchies, and semantic connections between entities. They leverage ontologies, which define standardized and structured knowledge, enabling a holistic view of the data and supporting advanced querying and analysis capabilities. Furthermore, knowledge graph-based systems promote data interoperability and integration by adopting standard ontologies, facilitating collaboration and data sharing across different research groups and institutions. As such, knowledge graph-based systems are able to help ensure that research data is more findable, accessible, interoperable, and reusable (FAIR). [22]
In this paper, we set out to contribute to the advancement of the science of LHSs by presenting a detailed description of the technical characteristics and infrastructure that were employed to design a radiation oncology LHS specifically with a knowledge graph approach. The paper also describes how we have addressed the challenges that arise when building such a system, particularly in the context of constructing a knowledge graph. The main contributions of our work are as follows:
:1. Provides an overview of the sources of data within radiation oncology (EHRs, TPS, TMS) and the mechanism to gather data from these sources in a common database.
2. Maps the gathered data to a standardized terminology and data dictionary for consistency and interoperability. Here we describe the processing layer built for data cleaning, checking for consistency and formatting before the extract, transform, and load (ETL) procedure is performed in a common database.
3. Adds concepts, classes, and relationships from existing ''NCI Thesaurus'' and [[SNOMED]] terminologies to the previously published Radiation Oncology Ontology (ROO) to fill in gaps with missing critical elements in the LHS.
4. Presents a knowledge graph visualization that demonstrates the usefulness of the data, with nodes and relationships for easy understanding by clinical researchers.
5. Develops an ontology-based keyword searching tool that utilizes semantic meaning and relationships to search the RDF knowledge graph for similar patients.
6. Provides a valuable contribution to the field of radiation oncology by describing an LHS infrastructure that facilitates data integration, standardization, and utilization to improve patient care and outcomes.
==Material and methocs==
===Gather data from multiple source systems in the radiation oncology domain===





Revision as of 22:38, 9 May 2024

Full article title Infrastructure tools to support an effective radiation oncology learning health system
Journal Journal of Applied Clinical Medical Physics
Author(s) Kapoor, Rishabh; Sleeman IV, William C.; Ghosh, Preetam; Palta, Jatinder
Author affiliation(s) Virginia Commonwealth University
Primary contact rishabh dot kapoor at vcuhealth dot org
Year published 2023
Volume and issue 24(10)
Article # e14127
DOI 10.1002/acm2.14127
ISSN 1526-9914
Distribution license Creative Commons Attribution 4.0 International
Website https://aapm.onlinelibrary.wiley.com/doi/10.1002/acm2.14127
Download https://aapm.onlinelibrary.wiley.com/doi/pdfdirect/10.1002/acm2.14127 (PDF)

Abstract

Purpose: The concept of the radiation oncology learning health system (RO‐LHS) represents a promising approach to improving the quality of care by integrating clinical, dosimetry, treatment delivery, and research data in real‐time. This paper describes a novel set of tools to support the development of an RO‐LHS and the current challenges they can address.

Methods: We present a knowledge graph‐based approach to map radiotherapy data from clinical databases to an ontology‐based data repository using FAIR principles. This strategy ensures that the data are easily discoverable, accessible, and can be used by other clinical decision support systems. It allows for visualization, presentation, and analysis of valuable data and information to identify trends and patterns in patient outcomes. We designed a search engine that utilizes ontology‐based keyword searching and synonym‐based term matching that leverages the hierarchical nature of ontologies to retrieve patient records based on parent and children classes, as well as connects to the Bioportal database for relevant clinical attributes retrieval. To identify similar patients, a method involving text corpus creation and vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) are employed, using cosine similarity and distance metrics.

Results: The data pipeline and tool were tested with 1,660 patient clinical and dosimetry records, resulting in 504,180 RDF (Resource Description Framework) tuples and visualized data relationships using graph‐based representations. Patient similarity analysis using embedding models showed that the Word2Vec model had the highest mean cosine similarity, while the GloVe model exhibited more compact embeddings with lower Euclidean and Manhattan distances.

Conclusions: The framework and tools described support the development of an RO‐LHS. By integrating diverse data sources and facilitating data discovery and analysis, they contribute to continuous learning and improvement in patient care. The tools enhance the quality of care by enabling the identification of cohorts, clinical decision support, and the development of clinical studies and machine learning (ML) programs in radiation oncology.

Keywords: FAIR, learning health system infrastructure, ontology, Radiation Oncology Ontology, Semantic Web

Background and significance

For the past three decades, there is a growing interest in building learning organizations to address the most pressing and complex business, social, and economic challenges facing society today. [1] For healthcare, the National Academy of Medicine has defined the concept of a learning health system (LHS) as an entity where science, incentive, culture, and informatics are aligned for continuous innovation, with new knowledge capture and discovery as an integral part for practicing evidence-based medicine. [2] The current dependency on randomized controlled clinical trials that use a controlled environment for scientific evidence creation with only a small percent (<3%) of patient samples is inadequate now and may be irrelevant in the future since these trials take too much time, are too expensive, and are fraught with questions of generalizability. The Agency for Healthcare Research and Quality has also been promoting the development of LHSs as part of a key strategy for healthcare organizations to make transformational changes to improve healthcare quality and value. Large-scale healthcare systems are now recognizing the need to build infrastructure capable of continuous learning and improvement in delivering care to patients and address critical population health issues. [3] In an LHS, data collection should be performed from various sources such as electronic health records (EHRs), treatment delivery records, imaging records, patient-generated data records, and administrative and claims data, which then allows for this aggregated data to be analyzed for generating new insights and knowledge that can be used to improve patient care and outcomes.

However, only a few attempts at leveraging existing infrastructure tools used in routine clinical practice to transform the healthcare domain into an LHS have been made. [5, 6] Some examples of actual implementation have emerged, but by and large these concepts have been mostly discussed as conceptual ideas and strategies in the literature. There are several data organization and management challenges that must be addressed in order to effectively implement a radiation oncology LHS:

1. Data integration: Radiation oncology data are generated from a variety of sources, including EHRs, imaging systems, treatment planning systems (TPSs), and clinical trials. Integration of this data into a single repository can be challenging due to differences in data formats, terminologies, and storage systems. There is often significant semantic heterogeneity in the way that different clinicians and researchers use terminology to describe radiation oncology data. For example, different institutions may use different codes or terms to describe the same condition or treatment.
2. Data stored in disparate database schemas: Presently, the EHR, TPS, and treatment management system (TMS) data are housed in a series of relational database management systems (RDMS), which have rigid database structures, varying data schemas and can include lots of uncoded textual data. Tumor registries also stores data in their own defined schemas. Although the column names in the relational databases between two software products might be the same, semantic meaning based on the application of use may be completely different. Changing a database schema requires a lot of programming effort and code changes because of the rigid structure of the stored data, and it is generally advisable to retire old tables and build new tables with the added column definitions.
3. Episodic linking of records: Episodic linking of records refers to the process of integrating patient data from multiple encounters or episodes of care into a single comprehensive record. This record includes information about the patient's medical history, diagnosis, treatment plan, and outcomes, which can be used to improve care delivery, research, and education. Linking multiple data sources based on the patients episodic history of care is quite challenging because the heterogeneity of these data sources does not normally follow any common data storage standards.
4. Build data query tools based on semantic meaning of the data: Since the data are currently stored in multiple RDMSs for the specific purpose to cater the operations aspects of the patient care, extracting common semantic meaning from this data is very challenging. Common semantic meaning in healthcare data is typically achieved through the use of standardized vocabularies and ontologies that define concepts and relationships between them. Developing data query tools based on semantic meaning requires a high level of expertise in both the technical and domain-specific aspects of radiation oncology. Moreover, executing complex data queries, which includes tree-based queries, recursive queries, and derived data queries requires multiple tables joining operations in RDMSs, which is a costly operation.

While we are on the cusp of an artificial intelligence (AI) revolution in biomedicine, with the fast-growing development of advanced machine learning (ML) methods that can analyze complex datasets, there is an urgent need for a scalable intelligent infrastructure that can support these methods. The radiation oncology domain is also one of the most technically advanced medical specialties, with a long history of electronic data generation (e.g., radiation treatment (RT) simulation, treatment planning, etc.) that is modeled for each individual patient. This large volume of patient-specific real-world data captured during routine clinical practice, dosimetry, and treatment delivery make this domain ideally suited for rapid learning. [4] Rapid learning concepts could be applied using an LHS, providing the potential to improve patient outcomes and care delivery, reduce costs, and generate new knowledge from real world clinical and dosimetry data.

Several research groups in radiation oncology, including the University of Michigan, MD Anderson, and Johns Hopkins, have developed data gathering platforms with specific goals. [5] These platforms—such as the M-ROAR platform [6] at the University of Michigan, the system-wide electronic data capture platform at MD Anderson [7], and the Oncospace program at Johns Hopkins [8]—have been deployed to collect and assess practice patterns, perform outcome analysis, and capture RT-specific data, including dose distributions, organ-at-risk (OAR) information, images, and outcome data. While these platforms serve specific purposes, they rely on relational database-based systems without utilizing standard ontology-based data definitions. However, knowledge graph-based systems offer significant advantages over these relational database-based systems. Knowledge graph-based systems provide a more integrated and comprehensive representation of data by capturing complex relationships, hierarchies, and semantic connections between entities. They leverage ontologies, which define standardized and structured knowledge, enabling a holistic view of the data and supporting advanced querying and analysis capabilities. Furthermore, knowledge graph-based systems promote data interoperability and integration by adopting standard ontologies, facilitating collaboration and data sharing across different research groups and institutions. As such, knowledge graph-based systems are able to help ensure that research data is more findable, accessible, interoperable, and reusable (FAIR). [22]

In this paper, we set out to contribute to the advancement of the science of LHSs by presenting a detailed description of the technical characteristics and infrastructure that were employed to design a radiation oncology LHS specifically with a knowledge graph approach. The paper also describes how we have addressed the challenges that arise when building such a system, particularly in the context of constructing a knowledge graph. The main contributions of our work are as follows:

1. Provides an overview of the sources of data within radiation oncology (EHRs, TPS, TMS) and the mechanism to gather data from these sources in a common database.

2. Maps the gathered data to a standardized terminology and data dictionary for consistency and interoperability. Here we describe the processing layer built for data cleaning, checking for consistency and formatting before the extract, transform, and load (ETL) procedure is performed in a common database.

3. Adds concepts, classes, and relationships from existing NCI Thesaurus and SNOMED terminologies to the previously published Radiation Oncology Ontology (ROO) to fill in gaps with missing critical elements in the LHS.

4. Presents a knowledge graph visualization that demonstrates the usefulness of the data, with nodes and relationships for easy understanding by clinical researchers.

5. Develops an ontology-based keyword searching tool that utilizes semantic meaning and relationships to search the RDF knowledge graph for similar patients.

6. Provides a valuable contribution to the field of radiation oncology by describing an LHS infrastructure that facilitates data integration, standardization, and utilization to improve patient care and outcomes.

Material and methocs

Gather data from multiple source systems in the radiation oncology domain

Abbreviations, acronyms, and initialisms

Acknowledgements

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added.