Journal:Secure record linkage of large health data sets: Evaluation of a hybrid cloud model

From LIMSWiki
Revision as of 21:45, 21 December 2020 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Secure record linkage of large health data sets: Evaluation of a hybrid cloud model
Journal JMIR Medical Informatics
Author(s) Brown, Adrian P.; Randall, Sean M.
Author affiliation(s) Curtin University
Primary contact Email: adrian dot brown at curtin dot edu dot au
Year published 2020
Volume and issue 8(9)
Article # e18920
DOI 10.2196/18920
ISSN 2291-9694
Distribution license Creative Commons Attribution 4.0 International
Website https://medinform.jmir.org/2020/9/e18920/
Download https://medinform.jmir.org/2020/9/e18920/pdf (PDF)

Abstract

Background: The linking of administrative data across agencies provides the capability to investigate many health and social issues, with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high-risk by data custodians.

Objective: This study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local.

Methods: A new hybrid cloud model was developed, including privacy-preserving record linkage techniques and container-based batch processing. An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data.

Results: The cloud model kept identifiers on-premises and used privacy-preserved identifiers to run all linkage computations on cloud infrastructure. Our prototype used a managed container cluster in Amazon Web Services to distribute the computation using existing linkage software. Although the cost of computation was relatively low, the use of existing software resulted in an overhead of processing of 35.7% (149/417 minutes execution time).

Conclusions: The result of our experimental evaluation shows the operational feasibility of such a model and the exciting opportunities for advancing the analysis of linkage outputs.

Keywords: cloud computing, medical record linkage, confidentiality, data science

Introduction

Background

In the last 10 years, innovative development of software applications, wearables, and the internet of things has changed the way we live. These technological advances have also changed the way we deliver health services and provide a rapidly expanding information resource, with the potential for data-driven breakthroughs in the understanding, treatment, and prevention of disease. Additional information from patient-related devices like mobile phone and Google search histories[1], wearable devices[1], and mobile phone apps[2] provides new opportunities for monitoring, managing, and improving health outcomes in new and innovative ways. The key to unlocking these data is in relating details at the individual patient level to provide an understanding of risk factors and appropriate interventions.[3] The linking, integration, and analysis of these data has recently been described as "population data science."[4]


References

  1. 1.0 1.1 Abebe, R.; Hill, S.; Vaughan, J.W. et al. (2019). "Using Search Queries to Understand Health Information Needs in Africa". Proceedings of the Thirteenth International AAAI Conference on Web and Social Media 13 (1): 3–14. https://ojs.aaai.org/index.php/ICWSM/article/view/3360.  Cite error: Invalid <ref> tag; name "AbebeUsing19" defined multiple times with different content
  2. Lai, S.; Farnham, A.; Ruktanonchai, N.W. et al. (2019). "Measuring mobility, disease connectivity and individual risk: A review of using mobile phone data and mHealth for travel medicine". Journal of Travel Medicine 26 (3): taz019. doi:10.1093/jtm/taz019. PMC PMC6904325. PMID 30869148. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6904325. 
  3. Khoury, M.J.; Iademarco, M.F.; Tiley, W.T. (2016). "Precision Public Health for the Era of Precision Medicine". American Journal of Prevantative Medicine 50 (3): 398-401. doi:10.1016/j.amepre.2015.08.031. PMC PMC4915347. PMID 26547538. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915347. 
  4. McGrail, K.; Jones, K. (2018). "Population Data Science: The science of data about people". Conference Proceedings for International Population Data Linkage Conference 2018 3 (4). doi:10.23889/ijpds.v3i4.918. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added. At the time of loading of this article, the links to the Additional File 1 and 2 were broken on the original site; a request to fix the errors has been sent to the journal.