Journal:Using knowledge graph structures for semantic interoperability in electronic health records data exchanges

From LIMSWiki
Revision as of 14:58, 12 June 2022 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Using knowledge graph structures for semantic interoperability in electronic health records data exchanges
Journal Information
Author(s) Sachdeva, Shelly; Bhalla, Subhash
Author affiliation(s) National Institute of Technology Delhi, University of Aizu
Primary contact Email: shellysachdeva at nitdelhi dot ac dot in
Year published 2022
Volume and issue 13(2)
Article # 52
DOI 10.3390/info13020052
ISSN 2078-2489
Distribution license Creative Commons Attribution 4.0 International
Website https://www.mdpi.com/2078-2489/13/2/52/htm
Download https://www.mdpi.com/2078-2489/13/2/52/pdf (PDF)

Abstract

Information sharing across medical institutions is restricted to information exchange between specific partners. The lifelong electronic health record (EHR) structure and content require standardization efforts. Existing standards such as openEHR, Health Level 7 (HL7), and CEN TC251 EN 13606 (Technical committee on Health Informatics of the European Committee for Standardization) aim to achieve data independence along with semantic interoperability. This study aims to discover knowledge representation to achieve semantic health data exchange. openEHR and CEN TC251 EN 13606 use archetype-based technology for semantic interoperability. The HL7 Clinical Document Architecture is on its way to adopting this through HL7 templates. Archetypes are the basis for knowledge-based systems, as these are means to define clinical knowledge.

The paper examines a set of formalisms for the suitability of describing, representing, and reasoning about archetypes. Each of the information exchange technologies—such as XML, Web Ontology Language (OWL), Object Constraint Language (OCL), and Knowledge Interchange Format (KIF)—is evaluated as a part of the knowledge representation experiment. These examine the representation of archetypes as described by Archetype Definition Language (ADL). The evaluation maintains a clear focus on the syntactic and semantic transformations among different EHR standards.

Keywords: archetypes, electronic health records, dual-model approach, knowledge representation, EHR, XML, ADL, OWL, KIF

Introduction

Healthcare is a continuously evolving domain. New findings of diseases and clinical treatments are continuously being made. It has raised the need for increased information exchange among various medical institutions. Electronic health records (EHRs) contain the medical history and treatments of the patients at those medical institutions. In the classical approach, information and knowledge are stored together. However, storage of each clinical concept in a single relation led to a huge data model that was difficult to manage and expensive to maintain. Among the existing interoperability approaches for EHRs, the dual-model approach [1] seems to be most promising. It consists of an information layer and a knowledge layer. The key benefit of this approach is the segregation of knowledge (represented as archetypes [2]). A conceptual idea is virtually transferred through the medium of an intermediate structure of a knowledge graph. This study analyzes that knowledge graph's components.

Knowledge graphs are used to capture knowledge in application-based situations that require large-scale integration, management, and extraction of value from a variety of data sources. Recent studies examine all currently available knowledge graphs (KGs), including their characteristics, approaches, applications, issues, and challenges. [3,4]

This paper focuses on using knowledge representation and information interchange technologies for archetype representation.

Overview of EHRs

The domain of the modern EHR is complex. It consists of different types of data (from textual to multimedia), with new data requirements emerging over time. For example, there are about 300,000 medical terms at present (as defined by SNOMED CT), and medical tests and procedures are constantly created and modified. EHRs have a complex structure based on archetypes. These may include data based on hundreds of parameters, such as temperature, blood pressure, and body mass index (BMI). Each of the individual parameters (or concepts) has its own specific content and is represented as an archetype. For example, one archetype could contain an item such as "data," which can, for example, be represented as a documented heart rate observation. This archetype ideally offers complete knowledge about a clinical context (i.e., attributes of data), the data's "state" (i.e., context for interpretation of data), and its "protocol" (i.e., information regarding the gathering of data) (see Appendix A). Various standards development organizations are working to improve the interoperability of semantic EHRs through these archetypes and more.

It is desirable to have EHR systems that are functionally and semantically interoperable systems. Interoperability can be defined as an ability to communicate data such that the data are sufficient to perform the tasks at the receiving system. The associated data items have the same meaning for the creator of the sending party and the users of the receiving party, and the tasks performed using the data must be to the satisfaction of the receiving party. To tackle the EHR interoperability problem, many authorized organizations have defined several standards. Examples include Health Level 7 (HL7) and its Clinical Document Architecture (CDA), ASTM International's Continuity of Care Record (CCR), European Committee for Standardization (CEN) Technical Committee 251 and International Organization for Standardization (ISO)'s ISO/EN 13606, and the openEHR Foundation's openEHR. The main objective of all these EHR standards is to structure the data and mark up the content of the medical information to be more readily exchanged.

For this work, three levels of interoperability stand out, namely syntactic (data) interoperability, structural interoperability/semantic interpretability, and semantic interoperability. [5] The main mechanisms for interoperability are reference models, archetypes, and domain knowledge governance.

Archetypes and semantic interoperability

The standards for semantic interoperability (such as CDA, openEHR, and ISO/EN 13606) endorse the two-level modeling approach for storing EHR content. [6] It consists of two layers that propose to segregate information modeling from content (knowledge) modeling. The reference (information) model layer represents the generic structures of components of the healthcare data. The content model on the other hand is used to represent more domain-specific data, which in general have instability due to variability and high rate of change in their usage (e.g., a formal description of a physical examination or prescription).

In openEHR and ISO/EN 13606, the first level is known as the reference model (RM) and the second level consists of archetypes. The RM defines the basic fundamental structure and represents the generic structures of components of the healthcare data at the storage level (i.e., information modeling). At the second layer, the archetype model (AM) constrains the generic structure to encompass logical semantics and, thus, provide a standard definition that aids in semantic interoperability. AM provides deliverables in the form of archetypes and templates. An archetype provides the meta-description of structured clinical records as a computable formalism. In HL7's CDA, the two levels are the Reference Information Model (RIM) and the HL7 templates, which function essentially the same as the archetype concept.

These standards support compatibility among each other. In the case of openEHR and ISO/EN 13606, the only means of achieving interoperability with a generic information model is through their archetypes. In fact, ADL archetypes can be defined against any Unified Modeling Language (UML) model, and it is also possible to write the archetypes against the HL7 Version 3 RIM and the CDA in general. Kilic and Dogac [7] note this in their work, describing how the clinical statements of two different EHR standards derived from the same RIM can be mapped to each other by using archetypes, Refined Message Information Model derivations, and semantic tools.

As mentioned prior, an archetype is an agreed upon formal and interoperable specification of a re-usable clinical data set that underpins an EHR. It captures the maximum possible information about a particular and discrete clinical concept. [2] A conceptual definition of data as archetypes can be developed in terms of constraints on structure, types, values, and behaviors of reference model classes based on the dual-model approach. It consists of the knowledge layer as archetypes and a reference model. An example of a simple archetype is "Weight," which can be used in multiple places as required within an EHR.

Semantics in archetypes have a dual nature. They consist of both structural and terminological components. The structure of an archetype provides support for semantics, while EHR component links form a set of interrelated conceptual, clinical entities. Each entity has a set of terminological bindings associated with it (specified by links to terms of specific medical terminologies).

If data elements are created and modified using archetypes, the archetypes constrain the configuration of data instances to be valid according to the archetype. These are a paradigm for building semantically enabled software systems, providing data validation, clinical modeling (by domain experts), a basis for querying, and form design. An archetype might define or constrain relationships between data values within a data structure. These are expressed as algorithms, formula, or rules. An archetype's metadata defines its core concept, purpose, use, evidence, authorship, and versioning. An archetype also ensures a maximal dataset. It contains all the relevant information regarding a clinical concept. Once the format of an archetype is agreed upon and published, it is held in a "library’" and made available for use in any part of a given application by multiple vendor systems, multiple institutions, and multiple geographical regions. Each group or entity using the same archetype will understand and compute data captured by the same archetype in another clinical environment. Thus, an archetype serves the following key purposes [8,9]:

  1. It allows domain experts (clinicians) to capture data for their information systems.
  2. It provides runtime validation of data input, thus improving data entry quality.
  3. It provides a basis for intelligent querying of data.

Representing internal data in archetypes

Matching clinical data to codes in controlled terminologies is the first step towards achieving data standardization for safe and accurate data interoperability. Archetypes have the advantage of being able to separate the internal model data from formal terminologies. Existing terminologies, taxonomies, and ontologies have been written in many languages. For example, Medical Subject Headings (MESH) [10] and the National Cancer Institute (NCI) [11] have their own proprietary formalisms (now commonly expressed also in XML). The "term binding" section of the archetype is used to describe the equivalences between archetype local terms and terms found in external terminologies, such as SNOMED CT or Unified Medical Language System (UMLS). The internal data are assigned local names and later bound or mapped to external terminology codes. This feature eliminates the need to make changes to the model whenever the terminology changes. For formal descriptions, the Archetype Definition Language (ADL) uses three other syntaxes—cADL (constraint form of ADL), dADL (data definition form of ADL), and a version of first-order predicate logic (FOPL)—to describe constraints on data, which are instances of some information model (e.g., expressed in UML). [12] Thus, ADL can be used to write archetypes for any domain where formal object model(s) exist, which describe data instances.

EHRs and data modeling

The openEHR architecture [1] includes a design principle called "ontological separation," which regulates the EHR modeling (Figure 1). The model consists of two main categories: "ontologies of information" and "ontologies of reality." The ontologies of information contain the information models of the EHR content, whereas the ontologies of reality describe real phenomena with descriptions and classifications.


Fig1 Sachdeva Information22 13-2.png

Fig. 1 Illustration of openEHR’s ontological structure

The ontologies of information are divided into several models:

  1. Domain content models (knowledge models) containing formal definitions of the clinical content. These are developed using archetypes, which are designed such that these can change when new clinical needs arise.
  2. Information representation models are implemented in the electronic healthcare system's software. These are used as a foundation for the domain content models and are designed to be stable regarding model changes. In openEHR, this component is named the "reference model" (RM).

In simpler terms, if RM is equivalent to the set of letters/digits, then each and every archetype would be a set of grammar constraining which strings could be expressed for that archetype. In formal terms:

RM = {Set of classes C1, C2, C3 ...Cn}
Archetype = {Set of rules for valid combination of classes of RM}

The ontologies of reality can be broken down as:

  1. Classifications: ICDx (International Classification of Diseases) and ICPC (International Classification of Primary Care)
  2. Process descriptions: Clinical guidelines
  3. Descriptive terminologies: SNOMED CT or LOINC

In Figure 1, the EHR extracts are based on commonly shared archetypes. These are proposed as a means to exchange information between different health care providers. [1] The semantics of the domain content models (e.g., archetypes) are provided by terminology binding. The meaning of nodes in archetypes is given by textual descriptions and references to external terminology systems. These are in the form of term definition and term binding. Representation of archetypes in various possibilities such as ADL, XML (Extensible Markup Language), and OWL (Web Ontology Language) has been described in the paper. Subsequently, a comparison is drawn amongst these information exchange technologies, and the advantages and disadvantages of each have been analyzed. The main aim of this study is to find the best representation of an archetype. The paper examines a set of formalisms for the suitability of describing, representing, and reasoning about archetypes.

Formalisms in transformation of archetypes among different standards: Background

Prevalent standards such as CDA, openEHR, and ISO/EN 13606 use archetype-based technology for semantically interoperable exchange. ADL archetypes can be written against any UML model, and it would be possible to write archetypes directly against the RIM [13], and also the CDA specification [14] (using a UML expression derived from its XML schema). These standards define the structure and the markup of the clinical content to make EHR exchange interoperable. They all rely on dual-model technology for semantic interoperability. They have different classes in the reference model, having abstract semantics. Although the names of the classes are not shared between these standards, their semantics are similar. The RM for all the standards is stable. If alignments are performed between archetypes of different standards, aligning algorithms based on similarity measures will fail, as class names (of RM) are disparate. Dictionary-based approaches will not be of much help, as all names are quite abstract. [15] The various ongoing exchange efforts based on archetypes use different formalisms to represent archetypes. The mapping among different standards makes use of model management and uses OWL transformations and XML representation of archetypes.

For interoperability among EHR standards based on ontologies, the possible approaches are (i) building common ontology and (ii) reusing existing ontologies and combining them. The first approach requires the transformation of ADL archetypes into OWL. [16,17] The ontological information and archetype models have been compared to find similarities and differences among the CEN and openEHR representations. The software tool available based on this approach is Poseacle convertor. [18] This approach involves the use of XML and OWL representations, with the latter being obtained by reusing existing ontologies and combining them through ontology mapping. The ARTEMIS project was implemented based on this, using the various formalisms such as XML, ADL, and OWL. [19]

Broadly speaking, archetypes must be agreed upon before communication. However, it does not seem feasible to expect all professionals of various disciplines to agree on exactly all details of the archetypes associated with the data they would like to exchange. If this approach becomes widely accepted, it is certain that the number of available archetypes will become very large. Although archetypes are annotated with terms from standardized ontologies (terminologies, taxonomies, etc.), there will still be differences at the archetype level and at the terminology level. Due to competing standards, local variations at the archetype level will stem from the specialization of archetypes for specific purposes and research projects. Further, several widely used terminologies could be used to annotate archetypes (e.g., SNOMED CT, MeSH, NCI). Local ontologies are also used to annotate archetypes; therefore, a sound and general process for matching archetypes are essential.

In 2019, Adel et al. [20] proposed a unified framework based on a fuzzy ontology to show how to exploit semantic web technologies to support EHR semantic interoperability. Prior to that, Martinez-Costa et al. [21] introduced ontologies and rules as a means of establishing interoperability amongst heterogeneous health systems (openEHR and HL7). Recently, Roehrs et al. [22] proposed an application model called OmniPHR, a model for assessing the structure of semantic interoperability and database integration from various health standards. OmniPHR uses artificial intelligence, natural language processing, and a standard ontology to achieve interoperability. Knowledge graphs are represented using a variety of methodologies, but machine learning methods are frequently employed to create a low-dimensional representation that can support a wide range of applications. [23]

Given this, there is a requirement to identify the role of the suitability of various formalisms for achieving full semantic interoperability through archetypes, which addressed in the current research.

Knowledge representation

In a 2019 review of more than 100 papers on knowledge representation in health care, Riaño et al. [24] found that ontologies (31%), semantic web-related formalisms (26%), decision tables and rules (19%), logic (14%), and probabilistic models (10%) represented the most common knowledge representation approaches. They also found that medical informatics knowledge was primarily represented as computer interpretable clinical recommendations (43%), medical domain ontologies (26%), and EHRs (22%). [24] In most of these cases, embedding codes can convey the meaning of the concepts that are represented by an archetype from a commonly recognized terminology at appropriate points in the archetype. Archetypes are the unit of communication between interoperating applications, as they define the minimum context that must be considered for safe communication. [15] Expressivity is a key parameter in choosing or creating a knowledge representation. It is easier and more compact to express a fact or element of knowledge within the semantics and grammar of a more expressive knowledge representation. However, more expressive languages will likely require more complex logic and algorithms to construct equivalent inferences. A highly expressive knowledge representation is also less likely to be complete and consistent. Less expressive KRs may be both complete and consistent. [25] This section describes the knowledge technologies for the representation of archetypes.

ADL and XML

The ADL approach uses existing UML semantics and existing terminologies and adds convenient syntax for expressing the required constraints. Expressing the semantics of archetypes using XML-based exchange formats leads to the conflation of abstract and concrete representational semantics. [12] ADL syntax is straightforward and powerful. It has allowed mappings to other formalisms to be more correctly defined and understood. Previously, archetypes have been expressed as XML instance documents conforming to W3C XML schemas [26], for example, in the Good Electronic Health Record (GEHR) [27] and openEHR projects. Subsequently, expressing archetype constraints using numerous schema languages for XML (such as XML schema, RELAX NG, and Schematron) has been examined. Because of the issues reported, these languages were abandoned for archetype validation. [28,29] For example, in XML schema, classes in RM were mapped to complex types, and archetypes were mapped to class restrictions. The strict rules (unique particle attribution, complex enumerations, placing regular expression constraint) in using the restriction feature in XML schema did not permit the implementation of archetype constraints.

With ADL parsing tools, it is possible to convert ADL to any number of forms, including various XML formats. XML instances can be generated from the object form of an archetype in memory. XML archetypes are equivalent to serialized instances of the parse tree, i.e., particular ADL archetypes serialized from objects into XML instances. Archetypes connect information structures to formal terminologies. Similar to XML data, they are path-addressable using path expressions directly convertible to Xpath expressions. An XML schema corresponding to the ADL object model has been published at openEHR.org. [30] The XML schema corresponding to RM is published in the work of Martinez-Costa et al. [21]

The XML has a role in the exchange requirement. The EHR_Extract, used for exchange, is expressed using XML. [1] A recent study examines whether the W3C XML schema provides a practicable solution for the semantic validation of standard-based EHR documents. [28] The EHR_Extract needs to be validated against the RM and the associated archetypes.

An example of XML/ADL use can be found in openEHR. To accept a report from the pathology laboratory for inclusion in the EHR repository of a patient (in the ADL form), an XML form is generated using the archetype. This form is shared with the laboratory for on-site validation of data input. Thus, XML is used as an input and transport medium.

A comparison between ADL and XML is stated below, and in Appendix B:

  1. Both are machine processable.
  2. ADL is human readable, whereas XML is sometimes unreadable (e.g., XML schema instances, OWL-RDF ontologies).
  3. ADL adheres to object-oriented semantics, particularly for container types, whereas XML schema languages do not follow object-oriented semantics.
  4. For ontological reference, ADL has domain entities/archetypes, and XML has global terms/concepts.
  5. ADL uses attributes, and XML uses attributes and sub-elements to represent object properties.
  6. ADL uses nearly half of its required space (storage) for tags, and XML may have data redundancy in contents.
  7. In terms of efficiency, ADL is a domain-specific language (sufficiently rich to capture and model the medical domain) in comparison to XML, which is good for web document modeling, though with limited ability to represent database contents.

ADL and OWL

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The original references are in alphabetical order; this version places them in or order of appearance, by design.