Difference between revisions of "Journal:Using knowledge graph structures for semantic interoperability in electronic health records data exchanges"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 164: Line 164:


==Evaluation in practice==
==Evaluation in practice==
We ran into many problems concerning alternative representations of the archetype and the inability to express some of the constraints in different knowledge representation languages. We detail these problems as a cautionary tale to others planning to use pre-existing archetypes for semantic interoperability, as a list of issues to consider when describing concepts formally in any language, and as a collection of criteria for evaluating alternative representations.


The three best-suited knowledge formalisms—ADL, XML, and OWL—for archetype representation have been evaluated for the features mentioned in the paper through the experimental setup KnowledgeRep. The ADL formalism has been compared with XML and OWL to find the suitable role of these formalisms in a working semantic interoperable system.
===KnowledgeRep: Simulation===
Various knowledge formalisms used in different EHR standards for obtaining semantic interoperability have been examined in current research with the openEHR standard. The underlying Java Reference Implementation [19] of openEHR has been used, referred to as KnowledgeRep. It is an evaluation setup for examining various archetype representations for EHR concepts. Figure 2, shows an application GUI (graphical user interface), the knowledge representations, and the communication classes. The application GUI represents various system interactions between the users and the EHR system. It is responsible for multiple calls to the utility and server-side classes to perform the formatting and transformations of the data and extract relevant data. It comprises the following modules: the form presented to the user, the template mechanism required to create the template corresponding to the EHR system, the data binding module, and the result viewer, which represents the report desired by the user. GWT (Google Web Toolkit) facilitates all these modules. It introduces the necessary flexibility into the model-driven approach of standard-based EHR system development.
[[File:Fig2 Sachdeva Information22 13-2.png|1000px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="1000px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Fig. 2''' KnowledgeRep: Evaluation setup for various archetype representations for EHR concepts</blockquote>
|-
|}
|}
The knowledge representation (archetypes) in ADL, XML, and OWL formalisms and the communication classes are shown in the second part of the diagram. It contains components that are responsible for communication. It also contains logic that connects the GUI with the underlying Java Reference Implementation of openEHR. The entire set of knowledge is parsed from ADL/XML/OWL files using the classes of RM and mapped to the GUI (form). The knowledge representation and communication part of the setup comprises the <<tt>ArchetypeWrapper</tt> and <tt>DatabaseWrapper</tt> classes, which act as the communication nodes. <tt>ArchetypeDAO</tt> maps the database and the basic RM data types. The clinical software database is shown (the details are beyond the scope of the current research).
===Methodology===
Here we describe the analysis procedure applied for each formalism, ADL, XML, and OWL. The archetypes used for the analysis have been downloaded from Clinical Knowledge Manager (CKM) [38], Appendix C. Each knowledge representation has been analyzed from the perspective of how strong it is to support the client side and the server side of the EHR system. In other words, we have tried to explore knowledge representation from a user’s perspective and the machine’s perspective.
====Representation in ADL====
The analysis of ADL was performed as given. A user selects a form to make a data entry, the corresponding .adl file is picked up from the archetype repository, it is parsed to extract all the node paths of the mandatory fields, and then a form is generated. When the user enters the value of any of the fields in the archetype, they are bonded with the exact path in the .adl file. We see that the ADL formalism is a powerful knowledge representation that can act as a GUI generator. At the same time, when the data are to be persisted, a user inputs to the corresponding archetype are mapped to the RM data types by intermediate classes and stored in the database. Along with the data type and the data value, the path of the node in the .adl file and the name of the .adl file are also persisted—this aids in the retrieval of the data from the database. Whenever the data are to be represented as a report, the data value is picked, mapped to the node whose path corresponds to it in the database, and then represented in the form corresponding to the archetype. Thus, ADL formalism plays a major role in data persistence in this case. From the above analysis, we can conclude that ADL, on the one hand, provides the object-oriented semantics and, on the other hand, is machine-processable; thus, it plays a significant role in both the client side and backend of an EHR system, making it a powerful knowledge formalism.
However, parsing of the ADL archetype returns objects according to the archetype object model. The archetype object model (AOM) is the definitive formal representation of archetype semantics. It is independent of syntax. The primary goal of the AOM specification is to tell developers how to build archetype tools and EHR components that utilize archetypes. It can be used to generate the output side of parsers that process archetypes in a linguistic format, such as the openEHR ADL. The semantics defined in the AOM is used to express the object structures of archetypes. These objects are equivalent to a syntax tree. The archetype object model can be thought of as a model of an in-memory archetype or template, or as a standard syntax tree for any serialized format—not only ADL. An archetype’s canonical abstract syntactic form is ADL, although it may also be parsed from and serialized to XML, JSON, or any other format. Calls to an appropriate AOM construction API from an archetype or template editing tool can also produce the in-memory archetype representation. This model is common to all dual-model-based standards; it will have no information about the particular reference model for which the archetype was built. Thus, the obtained objects cannot be used to perform any semantic activity such as comparison, selection, or classification.
====Representation in XML====
For experimental analysis, an XML representation automatically generated by the archetype editor is being used. This XML format conforms to the archetype constraint classes defined in the openEHR RM, and it can be directly imported into the oenEHR RM to initialize the relevant archetypes. Thus, the XML format is again generated via the archetype editor from a corresponding ADL counterpart. Each piece of data can subsequently be referenced using its XML path. For example, the patient’s first name (Paul) could be referenced as “/record/name/firstname.” The drawback of this formalism can be tracked in its inefficiency to represent the archetype constraints completely. It is, however, a suitable formalism for transforming and exchanging data from one form to another.
====Representation in OWL====
The semantic web enables greater access not only to content but also to services on the web. [39] Users and software agents should discover, invoke, compose, and monitor web resources offering particular services and having specific properties. They should be able to do so with a high degree of automation if desired. Powerful tools should be enabled by service descriptions across the web service lifecycle. OWL-S (formerly DAML-S) [39] is an ontology of services that makes these functionalities possible.
Since the experiment used the openEHR Java Reference Implementation of RM, it has been found that in order to use OWL formalism, the RM (i.e., information model) should be transformed into OWL statements (OWL is found to be at a more abstract level as compared to ADL, XML, and XML schema). As such, there exists a method for binding terminology to the EHR system with OWL as a bridging technology, given that the terminology is expressible in OWL.
Recent research [40] describes the ADL-to-OWL translation approach, describes the techniques to map archetypes to formal ontologies, and demonstrates how rules can be applied to the resulting representation. It translates definitions expressed in the openEHR ADL to a formal representation expressed using OWL. The formal representations are then integrated with rules expressed with Semantic Web Rule Language (SWRL) [41] expressions, providing an approach to apply the SWRL rules to concrete instances of clinical data. Sharing the knowledge expressed in rules is consistent with the philosophy of open sharing, encouraged by archetypes. The approach also allows the reuse of formal knowledge, expressed through ontologies, and extends reuse to propositions of declarative knowledge, such as those encoded in clinical guidelines.
===Findings===
To the best of the authors' knowledge, KnowledgeRep has investigated the suitability of various formalisms in the context of archetypes, using the specific semantics of their underlying reference models and hierarchical structure (tree) of archetype definitions. It examines whether currently available archetype languages provide direct support for mapping to formal ontologies and then exploiting reasoning on clinical knowledge, which are critical ingredients of full semantic interoperability.
Through KnowledgeRep, various comparisons, presented in Table 1, have been obtained. The conclusions of various results are also shown in Table 2 and Table 3.
{|
| style="vertical-align:top;" |
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | colspan="6" style="background-color:white; padding-left:10px; padding-right:10px;" |'''Table 1.''' Comparative analysis of various knowledge representations (in context to archetype)
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" |Features
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" |ADL
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" |XML
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" |OWL
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" |OCL
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" |KIF
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |'''Domain Modeling'''
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Archetype Description Language (ADL)
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Web document model
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Web-enabled ontologies for building the semantic web
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Constraints on object models (not on data) can describe archetypes
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Formal semantics (sharable among software entities)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |'''Reference Model (RM) with object-oriented semantics'''
  | style="background-color:white; padding-left:10px; padding-right:10px;" |ADL syntax adheres to object-oriented reference models (expressed in UML for constraints)
  | style="background-color:white; padding-left:10px; padding-right:10px;" |XML and XML schema languages do not follow object-oriented semantics
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Requires explicit expression of a reference model in OWL to represent archetype constraints
  | style="background-color:white; padding-left:10px; padding-right:10px;" |All statements are FOPL statements; it is impossible to express an archetype in a structural way
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Existing information model and terminologies have to be converted to KIF statements to describe archetypes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |'''Constraint Representation'''
  | style="background-color:white; padding-left:10px; padding-right:10px;" |ADL enables constraints to be expressed in a structural and nested way for archetypes
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Strict rules in XML schema cannot express archetype constraints
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Inconvenient in OWL
  | style="background-color:white; padding-left:10px; padding-right:10px;" |OCL constraint types include function pre- and post-conditions, and class variants
  | style="background-color:white; padding-left:10px; padding-right:10px;" |
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |'''Path Traceability'''
  | style="background-color:white; padding-left:10px; padding-right:10px;" |ADL has a path syntax based on XPath (openEHR path) to deal with heavily nested structures
  | style="background-color:white; padding-left:10px; padding-right:10px;" |Inbuilt Xpath mechanism
  | style="background-color:white; padding-left:10px; padding-right:10px;" |No inbuilt path mechanism
  | style="background-color:white; padding-left:10px; padding-right:10px;" |The OCL syntax for paths (that traverse associations) is similar to XPath
  | style="background-color:white; padding-left:10px; padding-right:10px;" |No inbuilt path mechanism
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |'''Inbuilt Ontology Section'''
  | style="background-color:white; padding-left:10px; padding-right:10px;" |ADL provides independence from natural language and terminology issues by having a separate ontology per archetype, containing "bindings" and language-specific translations
  | style="background-color:white; padding-left:10px; padding-right:10px;" |No built-in syntax
  | style="background-color:white; padding-left:10px; padding-right:10px;" |No built-in syntax and requires the semantics to be represented from first principles
  | style="background-color:white; padding-left:10px; padding-right:10px;" |No built-in syntax
  | style="background-color:white; padding-left:10px; padding-right:10px;" |No built-in syntax
|-
|}
|}





Revision as of 15:58, 12 June 2022

Full article title Using knowledge graph structures for semantic interoperability in electronic health records data exchanges
Journal Information
Author(s) Sachdeva, Shelly; Bhalla, Subhash
Author affiliation(s) National Institute of Technology Delhi, University of Aizu
Primary contact Email: shellysachdeva at nitdelhi dot ac dot in
Year published 2022
Volume and issue 13(2)
Article # 52
DOI 10.3390/info13020052
ISSN 2078-2489
Distribution license Creative Commons Attribution 4.0 International
Website https://www.mdpi.com/2078-2489/13/2/52/htm
Download https://www.mdpi.com/2078-2489/13/2/52/pdf (PDF)

Abstract

Information sharing across medical institutions is restricted to information exchange between specific partners. The lifelong electronic health record (EHR) structure and content require standardization efforts. Existing standards such as openEHR, Health Level 7 (HL7), and CEN TC251 EN 13606 (Technical committee on Health Informatics of the European Committee for Standardization) aim to achieve data independence along with semantic interoperability. This study aims to discover knowledge representation to achieve semantic health data exchange. openEHR and CEN TC251 EN 13606 use archetype-based technology for semantic interoperability. The HL7 Clinical Document Architecture is on its way to adopting this through HL7 templates. Archetypes are the basis for knowledge-based systems, as these are means to define clinical knowledge.

The paper examines a set of formalisms for the suitability of describing, representing, and reasoning about archetypes. Each of the information exchange technologies—such as XML, Web Ontology Language (OWL), Object Constraint Language (OCL), and Knowledge Interchange Format (KIF)—is evaluated as a part of the knowledge representation experiment. These examine the representation of archetypes as described by Archetype Definition Language (ADL). The evaluation maintains a clear focus on the syntactic and semantic transformations among different EHR standards.

Keywords: archetypes, electronic health records, dual-model approach, knowledge representation, EHR, XML, ADL, OWL, KIF

Introduction

Healthcare is a continuously evolving domain. New findings of diseases and clinical treatments are continuously being made. It has raised the need for increased information exchange among various medical institutions. Electronic health records (EHRs) contain the medical history and treatments of the patients at those medical institutions. In the classical approach, information and knowledge are stored together. However, storage of each clinical concept in a single relation led to a huge data model that was difficult to manage and expensive to maintain. Among the existing interoperability approaches for EHRs, the dual-model approach [1] seems to be most promising. It consists of an information layer and a knowledge layer. The key benefit of this approach is the segregation of knowledge (represented as archetypes [2]). A conceptual idea is virtually transferred through the medium of an intermediate structure of a knowledge graph. This study analyzes that knowledge graph's components.

Knowledge graphs are used to capture knowledge in application-based situations that require large-scale integration, management, and extraction of value from a variety of data sources. Recent studies examine all currently available knowledge graphs (KGs), including their characteristics, approaches, applications, issues, and challenges. [3,4]

This paper focuses on using knowledge representation and information interchange technologies for archetype representation.

Overview of EHRs

The domain of the modern EHR is complex. It consists of different types of data (from textual to multimedia), with new data requirements emerging over time. For example, there are about 300,000 medical terms at present (as defined by SNOMED CT), and medical tests and procedures are constantly created and modified. EHRs have a complex structure based on archetypes. These may include data based on hundreds of parameters, such as temperature, blood pressure, and body mass index (BMI). Each of the individual parameters (or concepts) has its own specific content and is represented as an archetype. For example, one archetype could contain an item such as "data," which can, for example, be represented as a documented heart rate observation. This archetype ideally offers complete knowledge about a clinical context (i.e., attributes of data), the data's "state" (i.e., context for interpretation of data), and its "protocol" (i.e., information regarding the gathering of data) (see Appendix A). Various standards development organizations are working to improve the interoperability of semantic EHRs through these archetypes and more.

It is desirable to have EHR systems that are functionally and semantically interoperable systems. Interoperability can be defined as an ability to communicate data such that the data are sufficient to perform the tasks at the receiving system. The associated data items have the same meaning for the creator of the sending party and the users of the receiving party, and the tasks performed using the data must be to the satisfaction of the receiving party. To tackle the EHR interoperability problem, many authorized organizations have defined several standards. Examples include Health Level 7 (HL7) and its Clinical Document Architecture (CDA), ASTM International's Continuity of Care Record (CCR), European Committee for Standardization (CEN) Technical Committee 251 and International Organization for Standardization (ISO)'s ISO/EN 13606, and the openEHR Foundation's openEHR. The main objective of all these EHR standards is to structure the data and mark up the content of the medical information to be more readily exchanged.

For this work, three levels of interoperability stand out, namely syntactic (data) interoperability, structural interoperability/semantic interpretability, and semantic interoperability. [5] The main mechanisms for interoperability are reference models, archetypes, and domain knowledge governance.

Archetypes and semantic interoperability

The standards for semantic interoperability (such as CDA, openEHR, and ISO/EN 13606) endorse the two-level modeling approach for storing EHR content. [6] It consists of two layers that propose to segregate information modeling from content (knowledge) modeling. The reference (information) model layer represents the generic structures of components of the healthcare data. The content model on the other hand is used to represent more domain-specific data, which in general have instability due to variability and high rate of change in their usage (e.g., a formal description of a physical examination or prescription).

In openEHR and ISO/EN 13606, the first level is known as the reference model (RM) and the second level consists of archetypes. The RM defines the basic fundamental structure and represents the generic structures of components of the healthcare data at the storage level (i.e., information modeling). At the second layer, the archetype model (AM) constrains the generic structure to encompass logical semantics and, thus, provide a standard definition that aids in semantic interoperability. AM provides deliverables in the form of archetypes and templates. An archetype provides the meta-description of structured clinical records as a computable formalism. In HL7's CDA, the two levels are the Reference Information Model (RIM) and the HL7 templates, which function essentially the same as the archetype concept.

These standards support compatibility among each other. In the case of openEHR and ISO/EN 13606, the only means of achieving interoperability with a generic information model is through their archetypes. In fact, ADL archetypes can be defined against any Unified Modeling Language (UML) model, and it is also possible to write the archetypes against the HL7 Version 3 RIM and the CDA in general. Kilic and Dogac [7] note this in their work, describing how the clinical statements of two different EHR standards derived from the same RIM can be mapped to each other by using archetypes, Refined Message Information Model derivations, and semantic tools.

As mentioned prior, an archetype is an agreed upon formal and interoperable specification of a re-usable clinical data set that underpins an EHR. It captures the maximum possible information about a particular and discrete clinical concept. [2] A conceptual definition of data as archetypes can be developed in terms of constraints on structure, types, values, and behaviors of reference model classes based on the dual-model approach. It consists of the knowledge layer as archetypes and a reference model. An example of a simple archetype is "Weight," which can be used in multiple places as required within an EHR.

Semantics in archetypes have a dual nature. They consist of both structural and terminological components. The structure of an archetype provides support for semantics, while EHR component links form a set of interrelated conceptual, clinical entities. Each entity has a set of terminological bindings associated with it (specified by links to terms of specific medical terminologies).

If data elements are created and modified using archetypes, the archetypes constrain the configuration of data instances to be valid according to the archetype. These are a paradigm for building semantically enabled software systems, providing data validation, clinical modeling (by domain experts), a basis for querying, and form design. An archetype might define or constrain relationships between data values within a data structure. These are expressed as algorithms, formula, or rules. An archetype's metadata defines its core concept, purpose, use, evidence, authorship, and versioning. An archetype also ensures a maximal dataset. It contains all the relevant information regarding a clinical concept. Once the format of an archetype is agreed upon and published, it is held in a "library’" and made available for use in any part of a given application by multiple vendor systems, multiple institutions, and multiple geographical regions. Each group or entity using the same archetype will understand and compute data captured by the same archetype in another clinical environment. Thus, an archetype serves the following key purposes [8,9]:

  1. It allows domain experts (clinicians) to capture data for their information systems.
  2. It provides runtime validation of data input, thus improving data entry quality.
  3. It provides a basis for intelligent querying of data.

Representing internal data in archetypes

Matching clinical data to codes in controlled terminologies is the first step towards achieving data standardization for safe and accurate data interoperability. Archetypes have the advantage of being able to separate the internal model data from formal terminologies. Existing terminologies, taxonomies, and ontologies have been written in many languages. For example, Medical Subject Headings (MESH) [10] and the National Cancer Institute (NCI) [11] have their own proprietary formalisms (now commonly expressed also in XML). The "term binding" section of the archetype is used to describe the equivalences between archetype local terms and terms found in external terminologies, such as SNOMED CT or Unified Medical Language System (UMLS). The internal data are assigned local names and later bound or mapped to external terminology codes. This feature eliminates the need to make changes to the model whenever the terminology changes. For formal descriptions, the Archetype Definition Language (ADL) uses three other syntaxes—cADL (constraint form of ADL), dADL (data definition form of ADL), and a version of first-order predicate logic (FOPL)—to describe constraints on data, which are instances of some information model (e.g., expressed in UML). [12] Thus, ADL can be used to write archetypes for any domain where formal object model(s) exist, which describe data instances.

EHRs and data modeling

The openEHR architecture [1] includes a design principle called "ontological separation," which regulates the EHR modeling (Figure 1). The model consists of two main categories: "ontologies of information" and "ontologies of reality." The ontologies of information contain the information models of the EHR content, whereas the ontologies of reality describe real phenomena with descriptions and classifications.


Fig1 Sachdeva Information22 13-2.png

Fig. 1 Illustration of openEHR’s ontological structure

The ontologies of information are divided into several models:

  1. Domain content models (knowledge models) containing formal definitions of the clinical content. These are developed using archetypes, which are designed such that these can change when new clinical needs arise.
  2. Information representation models are implemented in the electronic healthcare system's software. These are used as a foundation for the domain content models and are designed to be stable regarding model changes. In openEHR, this component is named the "reference model" (RM).

In simpler terms, if RM is equivalent to the set of letters/digits, then each and every archetype would be a set of grammar constraining which strings could be expressed for that archetype. In formal terms:

RM = {Set of classes C1, C2, C3 ...Cn}
Archetype = {Set of rules for valid combination of classes of RM}

The ontologies of reality can be broken down as:

  1. Classifications: ICDx (International Classification of Diseases) and ICPC (International Classification of Primary Care)
  2. Process descriptions: Clinical guidelines
  3. Descriptive terminologies: SNOMED CT or LOINC

In Figure 1, the EHR extracts are based on commonly shared archetypes. These are proposed as a means to exchange information between different health care providers. [1] The semantics of the domain content models (e.g., archetypes) are provided by terminology binding. The meaning of nodes in archetypes is given by textual descriptions and references to external terminology systems. These are in the form of term definition and term binding. Representation of archetypes in various possibilities such as ADL, XML (Extensible Markup Language), and OWL (Web Ontology Language) has been described in the paper. Subsequently, a comparison is drawn amongst these information exchange technologies, and the advantages and disadvantages of each have been analyzed. The main aim of this study is to find the best representation of an archetype. The paper examines a set of formalisms for the suitability of describing, representing, and reasoning about archetypes.

Formalisms in transformation of archetypes among different standards: Background

Prevalent standards such as CDA, openEHR, and ISO/EN 13606 use archetype-based technology for semantically interoperable exchange. ADL archetypes can be written against any UML model, and it would be possible to write archetypes directly against the RIM [13], and also the CDA specification [14] (using a UML expression derived from its XML schema). These standards define the structure and the markup of the clinical content to make EHR exchange interoperable. They all rely on dual-model technology for semantic interoperability. They have different classes in the reference model, having abstract semantics. Although the names of the classes are not shared between these standards, their semantics are similar. The RM for all the standards is stable. If alignments are performed between archetypes of different standards, aligning algorithms based on similarity measures will fail, as class names (of RM) are disparate. Dictionary-based approaches will not be of much help, as all names are quite abstract. [15] The various ongoing exchange efforts based on archetypes use different formalisms to represent archetypes. The mapping among different standards makes use of model management and uses OWL transformations and XML representation of archetypes.

For interoperability among EHR standards based on ontologies, the possible approaches are (i) building common ontology and (ii) reusing existing ontologies and combining them. The first approach requires the transformation of ADL archetypes into OWL. [16,17] The ontological information and archetype models have been compared to find similarities and differences among the CEN and openEHR representations. The software tool available based on this approach is Poseacle convertor. [18] This approach involves the use of XML and OWL representations, with the latter being obtained by reusing existing ontologies and combining them through ontology mapping. The ARTEMIS project was implemented based on this, using the various formalisms such as XML, ADL, and OWL. [19]

Broadly speaking, archetypes must be agreed upon before communication. However, it does not seem feasible to expect all professionals of various disciplines to agree on exactly all details of the archetypes associated with the data they would like to exchange. If this approach becomes widely accepted, it is certain that the number of available archetypes will become very large. Although archetypes are annotated with terms from standardized ontologies (terminologies, taxonomies, etc.), there will still be differences at the archetype level and at the terminology level. Due to competing standards, local variations at the archetype level will stem from the specialization of archetypes for specific purposes and research projects. Further, several widely used terminologies could be used to annotate archetypes (e.g., SNOMED CT, MeSH, NCI). Local ontologies are also used to annotate archetypes; therefore, a sound and general process for matching archetypes are essential.

In 2019, Adel et al. [20] proposed a unified framework based on a fuzzy ontology to show how to exploit semantic web technologies to support EHR semantic interoperability. Prior to that, Martinez-Costa et al. [21] introduced ontologies and rules as a means of establishing interoperability amongst heterogeneous health systems (openEHR and HL7). Recently, Roehrs et al. [22] proposed an application model called OmniPHR, a model for assessing the structure of semantic interoperability and database integration from various health standards. OmniPHR uses artificial intelligence, natural language processing, and a standard ontology to achieve interoperability. Knowledge graphs are represented using a variety of methodologies, but machine learning methods are frequently employed to create a low-dimensional representation that can support a wide range of applications. [23]

Given this, there is a requirement to identify the role of the suitability of various formalisms for achieving full semantic interoperability through archetypes, which addressed in the current research.

Knowledge representation

In a 2019 review of more than 100 papers on knowledge representation in health care, Riaño et al. [24] found that ontologies (31%), semantic web-related formalisms (26%), decision tables and rules (19%), logic (14%), and probabilistic models (10%) represented the most common knowledge representation approaches. They also found that medical informatics knowledge was primarily represented as computer interpretable clinical recommendations (43%), medical domain ontologies (26%), and EHRs (22%). [24] In most of these cases, embedding codes can convey the meaning of the concepts that are represented by an archetype from a commonly recognized terminology at appropriate points in the archetype. Archetypes are the unit of communication between interoperating applications, as they define the minimum context that must be considered for safe communication. [15] Expressivity is a key parameter in choosing or creating a knowledge representation. It is easier and more compact to express a fact or element of knowledge within the semantics and grammar of a more expressive knowledge representation. However, more expressive languages will likely require more complex logic and algorithms to construct equivalent inferences. A highly expressive knowledge representation is also less likely to be complete and consistent. Less expressive KRs may be both complete and consistent. [25] This section describes the knowledge technologies for the representation of archetypes.

ADL and XML

The ADL approach uses existing UML semantics and existing terminologies and adds convenient syntax for expressing the required constraints. Expressing the semantics of archetypes using XML-based exchange formats leads to the conflation of abstract and concrete representational semantics. [12] ADL syntax is straightforward and powerful. It has allowed mappings to other formalisms to be more correctly defined and understood. Previously, archetypes have been expressed as XML instance documents conforming to W3C XML schemas [26], for example, in the Good Electronic Health Record (GEHR) [27] and openEHR projects. Subsequently, expressing archetype constraints using numerous schema languages for XML (such as XML schema, RELAX NG, and Schematron) has been examined. Because of the issues reported, these languages were abandoned for archetype validation. [28,29] For example, in XML schema, classes in RM were mapped to complex types, and archetypes were mapped to class restrictions. The strict rules (unique particle attribution, complex enumerations, placing regular expression constraint) in using the restriction feature in XML schema did not permit the implementation of archetype constraints.

With ADL parsing tools, it is possible to convert ADL to any number of forms, including various XML formats. XML instances can be generated from the object form of an archetype in memory. XML archetypes are equivalent to serialized instances of the parse tree, i.e., particular ADL archetypes serialized from objects into XML instances. Archetypes connect information structures to formal terminologies. Similar to XML data, they are path-addressable using path expressions directly convertible to Xpath expressions. An XML schema corresponding to the ADL object model has been published at openEHR.org. [30] The XML schema corresponding to RM is published in the work of Martinez-Costa et al. [21]

The XML has a role in the exchange requirement. The EHR_Extract, used for exchange, is expressed using XML. [1] A recent study examines whether the W3C XML schema provides a practicable solution for the semantic validation of standard-based EHR documents. [28] The EHR_Extract needs to be validated against the RM and the associated archetypes.

An example of XML/ADL use can be found in openEHR. To accept a report from the pathology laboratory for inclusion in the EHR repository of a patient (in the ADL form), an XML form is generated using the archetype. This form is shared with the laboratory for on-site validation of data input. Thus, XML is used as an input and transport medium.

A comparison between ADL and XML is stated below, and in Appendix B:

  1. Both are machine processable.
  2. ADL is human readable, whereas XML is sometimes unreadable (e.g., XML schema instances, OWL-RDF ontologies).
  3. ADL adheres to object-oriented semantics, particularly for container types, whereas XML schema languages do not follow object-oriented semantics.
  4. For ontological reference, ADL has domain entities/archetypes, and XML has global terms/concepts.
  5. ADL uses attributes, and XML uses attributes and sub-elements to represent object properties.
  6. ADL uses nearly half of its required space (storage) for tags, and XML may have data redundancy in contents.
  7. In terms of efficiency, ADL is a domain-specific language (sufficiently rich to capture and model the medical domain) in comparison to XML, which is good for web document modeling, though with limited ability to represent database contents.

ADL and OWL

OWL [31] is a language for the semantic web, which:

  • offers expressiveness and the possibility of reasoning over the information it describes;
  • allows making annotations on classes or properties and makes semantic similarity functions available;
  • is related to terminologies (e.g., SNOMED CT is currently in the process of adapting its representation to semantic web environments [32], recognizing that having a representation of both the clinical and terminological information in the same formalism would facilitate better clinical knowledge management and would enrich archetypes by adding more information to them);
  • brings all information concerning a particular term together through modeling (e.g., code, definition, bindings, and translations); and
  • is supported by several tools which can process it.

OWL has an abstract syntax, an extension of XML-based syntax known as the resource description framework (RDF). OWL is general-purpose description logic (DL) and is primarily used to describe “classes” of things in such a way as to support subsumptive inferencing within the ontology, and by extension, on data that are instances of ontology classes. OWL includes intersection, union, complement of existential quantification, universal quantification, min cardinality, max cardinality, equivalence, and specialization. Thus, it demonstrates decidability and computability, offering expressiveness and the possibility of reasoning over the information it describes. OWL version 2 has a property of qualified cardinality restrictions (which makes it possible to capture the occurrences restrictions from ADL), property chain inclusions, and OWL/XML syntax.

ADL provides a rich set of constraints on primitive types, including dates and times; however, ADL has significant drawbacks for achieving the goal of semantic interoperability, such as its syntactic orientation. Consequently, the formalization of the exchange and transformation processes is more than using semantic-oriented models such as ontological ones. In addition to this, syntactic approaches also make important archetype-related tasks, such as comparing and classifying archetypes, difficult.

Archetypes in ADL can be represented in OWL. However, it requires [12]:

  • the expression of the relevant reference models in OWL;
  • the expression of the relevant terminologies in OWL;
  • the representation of concepts (i.e., constraints) independently of natural language; and
  • the conversion of the cADL part of an archetype to OWL.

Using OWL-expressed archetypes to validate data (which would require massive amounts of data to be converted to OWL statements) is unlikely to be anywhere near as efficient as doing it with archetypes expressed in ADL or one of its concrete expressions. [12] The UML-like representation is not suitable for performing formal reasoning at the conceptual level; however, OWL offers a great deal of inferencing power of the far wider scope in comparison to specific reasoning.

The object-oriented semantics apply in the UML specification of RM and the corresponding XML schema mapped from it. This needs to be mapped for conversion to OWL. Practically, this is possible, as illustrated in a recent study. [16] The XML-schema class is mapped to an OWL class with the same name. The restricted data type definitions in XML schema are mapped to an OWL data type. The attributes in XML-schema are mapped to an OWL property. The instances/objects in XML are equivalent to individuals in OWL.

Kilic et al. [33] provide mapping archetypes to OWL. For automatically transforming archetype definitions from ADL to OWL, the archetype ontologies use the ehr2ont framework. [34] LinkEHR [35] is a more recent project for obtaining the OWL representation of archetypes.

ADL and other formalisms

A comparative analysis of ADL with other formalisms such as the Object Constraint Language (OCL) and Knowledge Interchange Format (KIF) is shown later in the next section. The OCL allows constraining all class instances to conform to the specific configuration of instances. In contrast, ADL provides the ability to create numerous archetypes, each describing in detail a concrete configuration of instances of a class. ADL archetypes include invariants (which are expressed in a syntax similar to OCL).

The Resource Description Framework (RDF) is a data model for objects (“resources”), and relations between them provide simple semantics for this data model. [36] OWL takes the essential fact-stating ability of RDF and the class- and property-structuring capabilities of RDF schema and extends them in essential ways. OWL classes can be specified as logical combinations (intersections, unions, or complements) of other classes, or as enumerations of specified objects, going beyond the capabilities of the RDF schema. OWL can declare classes and organize these classes in a subsumption (“subclass”) hierarchy, as an RDF schema. The significant extension over RDF schema is the ability in OWL to provide restrictions on how properties behave (that are local to a class). A recent study concentrates on the semantic interoperability of diverse EHRs and their standards, proposing the transformation of heterogeneous EHR datasets into XML syntactic models and their translation into a common ontological representation for semantic knowledge acquisition. [37]

Evaluation in practice

We ran into many problems concerning alternative representations of the archetype and the inability to express some of the constraints in different knowledge representation languages. We detail these problems as a cautionary tale to others planning to use pre-existing archetypes for semantic interoperability, as a list of issues to consider when describing concepts formally in any language, and as a collection of criteria for evaluating alternative representations.

The three best-suited knowledge formalisms—ADL, XML, and OWL—for archetype representation have been evaluated for the features mentioned in the paper through the experimental setup KnowledgeRep. The ADL formalism has been compared with XML and OWL to find the suitable role of these formalisms in a working semantic interoperable system.

KnowledgeRep: Simulation

Various knowledge formalisms used in different EHR standards for obtaining semantic interoperability have been examined in current research with the openEHR standard. The underlying Java Reference Implementation [19] of openEHR has been used, referred to as KnowledgeRep. It is an evaluation setup for examining various archetype representations for EHR concepts. Figure 2, shows an application GUI (graphical user interface), the knowledge representations, and the communication classes. The application GUI represents various system interactions between the users and the EHR system. It is responsible for multiple calls to the utility and server-side classes to perform the formatting and transformations of the data and extract relevant data. It comprises the following modules: the form presented to the user, the template mechanism required to create the template corresponding to the EHR system, the data binding module, and the result viewer, which represents the report desired by the user. GWT (Google Web Toolkit) facilitates all these modules. It introduces the necessary flexibility into the model-driven approach of standard-based EHR system development.


Fig2 Sachdeva Information22 13-2.png

Fig. 2 KnowledgeRep: Evaluation setup for various archetype representations for EHR concepts

The knowledge representation (archetypes) in ADL, XML, and OWL formalisms and the communication classes are shown in the second part of the diagram. It contains components that are responsible for communication. It also contains logic that connects the GUI with the underlying Java Reference Implementation of openEHR. The entire set of knowledge is parsed from ADL/XML/OWL files using the classes of RM and mapped to the GUI (form). The knowledge representation and communication part of the setup comprises the <ArchetypeWrapper and DatabaseWrapper classes, which act as the communication nodes. ArchetypeDAO maps the database and the basic RM data types. The clinical software database is shown (the details are beyond the scope of the current research).

Methodology

Here we describe the analysis procedure applied for each formalism, ADL, XML, and OWL. The archetypes used for the analysis have been downloaded from Clinical Knowledge Manager (CKM) [38], Appendix C. Each knowledge representation has been analyzed from the perspective of how strong it is to support the client side and the server side of the EHR system. In other words, we have tried to explore knowledge representation from a user’s perspective and the machine’s perspective.

Representation in ADL

The analysis of ADL was performed as given. A user selects a form to make a data entry, the corresponding .adl file is picked up from the archetype repository, it is parsed to extract all the node paths of the mandatory fields, and then a form is generated. When the user enters the value of any of the fields in the archetype, they are bonded with the exact path in the .adl file. We see that the ADL formalism is a powerful knowledge representation that can act as a GUI generator. At the same time, when the data are to be persisted, a user inputs to the corresponding archetype are mapped to the RM data types by intermediate classes and stored in the database. Along with the data type and the data value, the path of the node in the .adl file and the name of the .adl file are also persisted—this aids in the retrieval of the data from the database. Whenever the data are to be represented as a report, the data value is picked, mapped to the node whose path corresponds to it in the database, and then represented in the form corresponding to the archetype. Thus, ADL formalism plays a major role in data persistence in this case. From the above analysis, we can conclude that ADL, on the one hand, provides the object-oriented semantics and, on the other hand, is machine-processable; thus, it plays a significant role in both the client side and backend of an EHR system, making it a powerful knowledge formalism.

However, parsing of the ADL archetype returns objects according to the archetype object model. The archetype object model (AOM) is the definitive formal representation of archetype semantics. It is independent of syntax. The primary goal of the AOM specification is to tell developers how to build archetype tools and EHR components that utilize archetypes. It can be used to generate the output side of parsers that process archetypes in a linguistic format, such as the openEHR ADL. The semantics defined in the AOM is used to express the object structures of archetypes. These objects are equivalent to a syntax tree. The archetype object model can be thought of as a model of an in-memory archetype or template, or as a standard syntax tree for any serialized format—not only ADL. An archetype’s canonical abstract syntactic form is ADL, although it may also be parsed from and serialized to XML, JSON, or any other format. Calls to an appropriate AOM construction API from an archetype or template editing tool can also produce the in-memory archetype representation. This model is common to all dual-model-based standards; it will have no information about the particular reference model for which the archetype was built. Thus, the obtained objects cannot be used to perform any semantic activity such as comparison, selection, or classification.

Representation in XML

For experimental analysis, an XML representation automatically generated by the archetype editor is being used. This XML format conforms to the archetype constraint classes defined in the openEHR RM, and it can be directly imported into the oenEHR RM to initialize the relevant archetypes. Thus, the XML format is again generated via the archetype editor from a corresponding ADL counterpart. Each piece of data can subsequently be referenced using its XML path. For example, the patient’s first name (Paul) could be referenced as “/record/name/firstname.” The drawback of this formalism can be tracked in its inefficiency to represent the archetype constraints completely. It is, however, a suitable formalism for transforming and exchanging data from one form to another.

Representation in OWL

The semantic web enables greater access not only to content but also to services on the web. [39] Users and software agents should discover, invoke, compose, and monitor web resources offering particular services and having specific properties. They should be able to do so with a high degree of automation if desired. Powerful tools should be enabled by service descriptions across the web service lifecycle. OWL-S (formerly DAML-S) [39] is an ontology of services that makes these functionalities possible.

Since the experiment used the openEHR Java Reference Implementation of RM, it has been found that in order to use OWL formalism, the RM (i.e., information model) should be transformed into OWL statements (OWL is found to be at a more abstract level as compared to ADL, XML, and XML schema). As such, there exists a method for binding terminology to the EHR system with OWL as a bridging technology, given that the terminology is expressible in OWL.

Recent research [40] describes the ADL-to-OWL translation approach, describes the techniques to map archetypes to formal ontologies, and demonstrates how rules can be applied to the resulting representation. It translates definitions expressed in the openEHR ADL to a formal representation expressed using OWL. The formal representations are then integrated with rules expressed with Semantic Web Rule Language (SWRL) [41] expressions, providing an approach to apply the SWRL rules to concrete instances of clinical data. Sharing the knowledge expressed in rules is consistent with the philosophy of open sharing, encouraged by archetypes. The approach also allows the reuse of formal knowledge, expressed through ontologies, and extends reuse to propositions of declarative knowledge, such as those encoded in clinical guidelines.

Findings

To the best of the authors' knowledge, KnowledgeRep has investigated the suitability of various formalisms in the context of archetypes, using the specific semantics of their underlying reference models and hierarchical structure (tree) of archetype definitions. It examines whether currently available archetype languages provide direct support for mapping to formal ontologies and then exploiting reasoning on clinical knowledge, which are critical ingredients of full semantic interoperability.

Through KnowledgeRep, various comparisons, presented in Table 1, have been obtained. The conclusions of various results are also shown in Table 2 and Table 3.

Table 1. Comparative analysis of various knowledge representations (in context to archetype)
Features ADL XML OWL OCL KIF
Domain Modeling Archetype Description Language (ADL) Web document model Web-enabled ontologies for building the semantic web Constraints on object models (not on data) can describe archetypes Formal semantics (sharable among software entities)
Reference Model (RM) with object-oriented semantics ADL syntax adheres to object-oriented reference models (expressed in UML for constraints) XML and XML schema languages do not follow object-oriented semantics Requires explicit expression of a reference model in OWL to represent archetype constraints All statements are FOPL statements; it is impossible to express an archetype in a structural way Existing information model and terminologies have to be converted to KIF statements to describe archetypes
Constraint Representation ADL enables constraints to be expressed in a structural and nested way for archetypes Strict rules in XML schema cannot express archetype constraints Inconvenient in OWL OCL constraint types include function pre- and post-conditions, and class variants
Path Traceability ADL has a path syntax based on XPath (openEHR path) to deal with heavily nested structures Inbuilt Xpath mechanism No inbuilt path mechanism The OCL syntax for paths (that traverse associations) is similar to XPath No inbuilt path mechanism
Inbuilt Ontology Section ADL provides independence from natural language and terminology issues by having a separate ontology per archetype, containing "bindings" and language-specific translations No built-in syntax No built-in syntax and requires the semantics to be represented from first principles No built-in syntax No built-in syntax


References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The original references are in alphabetical order; this version places them in or order of appearance, by design.