Journal:Recommendations for achieving interoperable and shareable medical data in the USA

Full article title	Recommendations for achieving interoperable and shareable medical data in the USA
Journal	Communications Medicine
Author(s)	Szarfman, Ana; Levine, Jonathan G.; Tonning, Joseph M.; Weichold, Frank; Bloom, John C.; Soreth, Janice M.; Geanacopoulos, Mark; Callahan, Lawrence; Spotnitz, Matthew; Ryan, Qin; Pease-Fye, Meg; Brownstein, John S. ; Hammond, W. Ed; Reich, Christian; Altman, Russ B.
Author affiliation(s)	U.S. Food and Drug Administration, independent researcher/contractor, Your Health Concierge, Purdue University, Columbia University, Boston Children’s Hospital, Duke Clinical & Translational Science Institute, Stanford University School of Medicine
Primary contact	Email: ana dot szarfman at fda dot hhs dot gov
Year published	2022
Volume and issue	2
Article #	86 (2022)
DOI	10.1038/s43856-022-00148-x
ISSN	2730-664X
Distribution license	Creative Commons Attribution 4.0 International
Website	https://www.nature.com/articles/s43856-022-00148-x
Download	https://www.nature.com/articles/s43856-022-00148-x.pdf (PDF)

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

Abstract

Easy access to large quantities of accurate health data is required to understand medical and scientific information in real time; evaluate public health measures before, during, and after times of crisis; and prevent medical errors. Introducing a system in the United States of America that allows for efficient access to such health data and ensures auditability of data facts, while avoiding data silos, will require fundamental changes in current practices. Here, we recommend the implementation of standardized data collection and transmission systems, universal identifiers for individual patients and end users, a reference standard infrastructure to support calibration and integration of laboratory results from equivalent tests, and modernized working practices. Requiring comprehensive and binding standards, rather than incentivizing voluntary and often piecemeal efforts for data exchange, will allow us to achieve the analytical information environment that patients need.

Keywords: drug development, public health, interoperability, data exchange, medical informatics

Introduction

Reported worldwide mortality from COVID-19 has surpassed six million people, with over 16% of those deaths found in the United States of America alone. [1] Despite our vaccination efforts against COVID-19, the analytical deficiencies of the country's health information systems uncovered by the pandemic remain largely unresolved. [2] We still cannot answer basic questions that should be answerable by a simple query of the data, such as "what is the mortality rate according to patient variables?" Also, public health systems and practitioners are still forced to rely on outmoded forms of communication (e.g., paper and fax) which do not provide rapid access to needed information.

Although recognized as a leader in advancing cutting-edge biomedical research and medical technology, the USA continues to rely on multiple, independent healthcare systems and versions that cannot seamlessly communicate with each other. This lack of interoperability within and across hospital systems, laboratories, public health programs, physicians’ offices, and regulatory and research data resources hinders rapid improvements in medical treatment, public health, decision-making, and research. The main reason for the failure to achieve interoperability—and for the information loss, inefficient operations, and huge (and frequently hidden) costs that result—is the lack of comprehensive, centrally coordinated, fully validated, traceable, and enforceable medical data collection and transmission standards. [3] Bi- and multi-directional feedback loops that are needed for prompt access to ancillary data, for clarifications, and for quickly reporting and addressing system and data errors are also lacking. Without easy access to this additional information, electronic health records (EHRs) cannot be made portable [4], and the full potential of those records to support research and innovation cannot be realized.

Data that can be easily exchanged is a goal that many parties have long been advocating for. The COVID-19 pandemic has made this issue more urgent than before, as we have “to move faster than the virus” (personal communication from Dr. Mirta Roses). Unfortunately, to date the COVID-19 pandemic has only underscored the consequences of having information systems that rely on non-binding standards for data management and exchange, standards that are themselves based on multiple, unreconciled data models. In principle, a data model should provide universal definitions of data elements (i.e., units of data having a precise meaning and interpretation) for users of heterogeneous data sites that want to share or aggregate data, to allow them to speak a common language. [5]

Emerging technologies that promise to revolutionize healthcare add additional urgency to efforts to achieve interoperability in our health information systems. In a decade, there will be more sources of data [6], for example from wearable devices such as the Apple Watch and Fitbit, that patients will use to record information. By combining these data with artificial intelligence and machine learning, in which machines are able to automatically process the data, diagnosis and prediction of patient outcomes could be improved. However, the successful use of these computational tools strongly depends on accurate collection and exchange of massive amounts of complex data derived from next-generation sequencing, imaging devices, laboratory assays, and many other sources. Unfortunately, the data being collected remain predominantly in local silos, and frequently the data are neither standardized nor of the quality required by these advanced automated approaches. [7,8] The promise of all of these technological and scientific advances will be unrealized without interoperable standards that are fully representative of real-world clinical data (not just based on theoretical examples), and are fit-for-purpose for data collection, exchange, integration, and analysis, and traceable to the original information.

Perhaps the most challenging roadblock for implementing interoperability for data collection is the tolerance for highly customized, proprietary health information systems and their unique versions. The inconsistencies created by unnecessary customization creates a state of confusion that makes it impossible to reliably identify in a timely fashion critical data facts and inconsistencies that must be communicated to those making critical decisions about how these systems should be designed and implemented. These decision makers include those in government organizations, the system vendors, software developers, and other stakeholders, including patients and patient advocates.

By presenting the following description of deficiencies in the US health information system and recommendations for addressing them at their root causes, we hope to stimulate constructive dialog among multiple stakeholders and inform policy changes in the US and other countries where such measures are needed. Recognizing our ethical responsibility to rapidly provide the best information to help patients [9], we propose building an alternative, more transparent system based on interoperability that starts at the data collection stage. This alternative system would be one in which the benefits of new computational technologies can be realized, where patients are able to take control of their data, and where accurate and timely data can be rapidly shared to advance medical research and improve public health.

The lack of universal and harmonized data collection and transmission standards

To date, policies to increase interoperability in our health information systems have been based on downstream transactions, for example they seek to improve e-prescribing, billing, health information exchange, certification of EHRs [10,11], and regulatory submissions. These policies do not enforce a universal standard for the collection and transmission of defined variables and values for each data element, even for straightforward information such as demographic data. [5] Lacking universal standards, most health data exchange is therefore subject to the custom constraints of a multitude of unique, proprietary health information technologies, and the non-interoperable, disparate versions of the data elements in these systems. For example, proprietary health information systems modify most lab data received in their databases by mapping them to built-in terms. This results in a multitude of data conversion cycles that are difficult to document and untangle, because they are not traceable to the original data elements. Mapping and remapping from the irregular internal codes of each system version to the standardized versions needed for exchanging data is an error-prone, inefficient, and costly process that is repeated in reverse at the receiving end(s) when integrating the exchanged data back into the internal codes of the system version in which they were received.

There are ongoing efforts by the Office of the National Coordinator for Health Information Technology (ONC) and the Centers for Medicare and Medicaid Services (CMS) in the US to support data exchange via Health Level 7's (HL7's) secure Fast Healthcare Interoperability Resources (FHIR) standard and associated application programming interfaces (APIs) (i.e., software exchange engines created by The Health Level Seven International healthcare standards organization). [12] However, without common, enforceable, and well-documented data structure and coding across pertinent health information systems and APIs, data exchange may still require manual and frequently blinded mapping, which makes rapid transfer of information unfeasible.

To achieve a health system that enables continuous improvements, we need systems that collect the data that are most important for patient care, for accomplishing critical analyses, for enhancing the level of evidence, and for addressing public health challenges. [13,14,15,16] Therefore, we must focus on developing universal standards for the collection and validation of the most clinically important data as they are created (e.g., results from centrally calibrated laboratory tests during the entire course of clinical care). Only when such standards are in place can we ensure that valid information is being correctly captured and delivered. We must also ensure that the diverse software, transfer engines, and information technology systems can correctly interpret these standards, and process standard nomenclatures and notations without corruption. Redundant backup systems, feedback loops for prompt and early identification and communication of problems, and automated data verification processes will be needed to ensure data integrity and identify and correct the sources of transmission errors. Options should be provided for the public to monitor the accuracy of their medical data throughout all encounters (e.g., prescriptions, diagnosis, procedures), in the same way they can monitor their interactions with the Social Security System or banking institutions.

A recent collaborative effort between HL7 International, which provides common standards for exchange of data in healthcare, and the Observational Health Data Sciences and Informatics (OHDSI) collaborative, which defines and maintains the common data model known as OMOP for international observational research studies, seeks to implement a unique data model for assembling and sharing information gathered in clinical care. This undertaking should enable us to integrate clinical data within huge repositories for advanced analytics, without the information loss caused by sequential mapping and remapping from and to a multitude of untraceable data models. [17] However, HL7 and OHDSI are not providing interoperable standards for the collection of factual data into EHRs. Without strong legislative support, funding, and enforcement, an interoperable model for data collection at the source that can fully address our most critical health information needs will not become a reality.

Although recommendations for addressing data interoperability in our health information systems are described in the policy documents of organizations involved in oversight [11] and in the scientific literature [18], much of the medical and scientific community remains insufficiently aware of the limitations of these systems, and tackling our widespread usability problems has not become a universally shared priority. We suggest that recent failures in disease prediction models [19,20,21]] can be attributed in part to irregularities in how data are captured, exchanged, and maintained, and our inability to systematically access and compare these data across multiple EHR systems and versions over time.

To build quality data systems, we must have reliable enforcement mechanisms in place to monitor the implementation of and adherence to interoperable data standards. To monitor such process, we need to conduct good clinical practice (GCP) inspections and adopt reliable monitoring tools and enforcement mechanisms (analogous to those used by the Treasury Department to assure honesty of monetary transactions). These inspections will require highly trained professionals capable of detecting inaccurate data, improper coding, and failures of prediction models that clinicians rely on.

There are limits to the time healthcare professionals can (and should) spend entering data. A core principle of informatics is that data should only be entered once, and whenever possible by the device collecting the data. In the scenarios where automated entry is not possible, an interoperable system should facilitate data entry and coding by providing automated, interactive graphic representations of the data already in the system and smarter options with standard terminology for outcomes for given symptoms, diseases, medications, and patient profiles. Establishing high-quality hardware and software systems for collecting and delivering interoperable and fully traceable healthcare data to users would also create a dynamic in which it would be easier to assess the value and cost of data, and what additional data should be captured. Furthermore, the creation and support with reimbursable billing codes of large numbers of positions for scientifically trained clinical information professionals to manage health information systems will increase the value of these systems for caregivers, researchers, and patients.

Issues requiring prompt attention

Lack of ascertainment of unique patients

Although the Health Insurance Portability and Accountability Act (HIPAA) initially required the creation of a health identifier in 1996 [22], federal funds for unique universal patient identifiers have been banned since Congress prohibited their use due to privacy concerns. [23] Our failure to implement national, unique identifiers linking a patient’s data to their healthcare professionals and their health information systems leads to unlinked, incomplete, and often duplicated records, and is another significant source of data quality problems that have been avoided in countries that have implemented unique identifiers. [23] In addition, it is still nearly impossible for a person to access their own vaccination records if they are in databases separate from their EHR records or were submitted by paper or fax. It is also difficult or impossible to carry out early cancer prevention studies [24] that require that complete clinical information be linked to the correct patients even when they change health providers.

Although the prospect of unique patient identifiers raises valid privacy concerns, it can be argued that it would be easier to monitor and protect privacy with a single, properly encoded universal identifier than with a multitude of poorly documented ones. The absence of a unique identifier is actually one of the greatest causes of invasion of privacy, because typically over half of the EHRs in an institution will mistakenly include someone else’s data (personal communication by Dr. W. Ed Hammond) that may be identifiable.

The current reliance on data aggregation techniques to protect patient privacy significantly delays our access to the information and impedes our understanding of the trajectory of diseases in individual patients, with potentially adverse consequences for their medical care and for identifying critical patient-level variables for subsequent research studies. We must therefore invest in better and updated privacy protection systems and law enforcement solutions. As data scientists, we are concerned about the limitations of HIPAA for privacy protection, due to the ease that such data can be re-identified. Our laws and regulations need to balance individual privacy protection with making data available for improving health outcomes. At a minimum, the approach to governance we adopt must ensure the following:

the system is able to identify and control who can have the authorized level of access to the medical records;
every user has a unique ID and a secure password;
audit trails are used to track every user activity, and to provide accountability;
only authorized personnel can access audit trails, and assess who has accessed or modified a record; and
the data storage provider is not able to access personal identifiable information.

A single patient identifier also has health equity ramifications in its favor. Patients who are poorer typically have less insurance coverage or none at all and often switch healthcare systems. They are underrepresented in health information systems and research studies, and less likely to have their specific needs understood. A unique identifier should improve the representation of these patients in our systems and thus our ability to address health inequities.

Lack of information about patient mortality

The inadequacy of our current system for data collection is well illustrated by our failure to collect data as fundamental as mortality in a standardized fashion. Fatal outcomes are not incorporated into the medical record unless death occurs during hospitalization. When needed for public health measures, epidemiological studies, and other research, data on death may be obtained from private services that collect information from funeral homes and obituaries, disease registries unconnected to EHRs, or from the National Death Index website. This website is typically late in gathering mortality information, as it is collected by a multitude of disparate local and state systems before being reported to the National Center for Health Statistics. Comprehensive data on mortality and cause of death should be methodically linked to clinical data for the over 330 million individuals in the US (as we have begun to do for COVID-19 cases). This information will allow for the creation of focused decision support systems for clinical data that are better designed to prevent serious and fatal medical errors, one of the top causes of death in US hospitals. [25]

Poorly codified and calibrated clinical laboratory data

Clinical laboratories began collecting digitized data in the 1960s. Although these data support 60 to 70 percent of decisions related to diagnosis, treatment, hospital admission, and discharge, they remain poorly codified, complicated to process, and are underused for medical decision-making and research.

US-based programs that defined the minimum government standards for EHRs have offered laboratories incentives to adopt proposed standards for messaging and encoding laboratory data. Unfortunately, serious functional problems still exist with the coding of laboratory test identifiers. There are multiple ways for the same analytes to be represented by different labs and instruments, and this results in improper assessments of coded terms and incorrect code selection and categorization. Moreover, coding systems often do not allow for transparent incorporation and transmission of the limits of detection of a test, the presence of interfering substances, and how a particular analyte is measured. Also, failure to enforce the use of consistent quantitative units of measure is a frequent source of data errors.

There is a pressing need for an expanded infrastructure to support the collection and distribution of the stable reference standards needed to support the accurate calibration and safe integration of the results from equivalent tests measuring the same analyte, performed by different instrument platforms or laboratories. [26,27] The ONC recognizes this problem as such:

Harmonization status indicates calibration equivalencies of tests and is required to verify clinical interoperability of results. Tests that are harmonized may be interpreted and trended together, and may use the same calculations, decision support rules, and machine learning models. Tests that are not harmonized should be interpreted and processed individually, not in aggregate with other tests. [3]

This infrastructure will simplify the identification of a natural functional interoperability pathway that can be used as a backbone for integrating the currently unwieldy, inconsistent, and incomplete data coding standards for laboratory data. An illustration of the consequences of the failure to fully standardize laboratory data collection and calibration of the results is the limited understanding of the evolving prevalence of COVID-19, due to the inability to account for the performance differences of the over 1,000 SARS-CoV-2 diagnostics that are listed worldwide. [28] We also need to understand their performance characteristics according to the particular purpose for which a test is being performed (e.g., permission to travel, to access specific facilities, etc.). [29]

Business practices that hinder modernization

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation and grammar to improve readability. In some cases important information was missing from the references, and that information was added.