Journal:Making big data useful for health care: A summary of the inaugural MIT Critical Data Conference

From LIMSWiki
Revision as of 19:34, 24 July 2015 by Shawndouglas (talk | contribs) (Added content. Saving and adding more.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Full article title Making big data useful for health care: A summary of the inaugural MIT Critical Data Conference
Journal JMIR Medical Informatics
Author(s) Badawi, O.; Brennan, T.; Celi, L.A.; Feng, M.; Ghassemi, M.; Ippolito, A.; Johnson, A.; Mark, R.G.; Mayaud, L.; Moody, G.;
Moses, C.; Naumann, T.; Nikore, V.; Pimentel, M.; Pollard, T.J.; Santos, M.; Stone, D.J.; Zimolzak, A.
Author affiliation(s) Massachusetts Institute of Technology
Primary contact Leo Anthony Celi, MD, MPH, MS - Phone: 1.617.253.7937; Email: lceli@mit.edu
Year published 2014
Volume and issue 2 (2)
Page(s) e22
DOI 10.2196/medinform.3447
ISSN 2291-9694
Distribution license Creative Commons Attribution 2.0
Website http://medinform.jmir.org/2014/2/e22/

Abstract

With growing concerns that big data will only augment the problem of unreliable research, the Laboratory of Computational Physiology at the Massachusetts Institute of Technology organized the Critical Data Conference in January 2014. Thought leaders from academia, government, and industry across disciplines — including clinical medicine, computer science, public health, informatics, biomedical research, health technology, statistics, and epidemiology — gathered and discussed the pitfalls and challenges of big data in health care. The key message from the conference is that the value of large amounts of data hinges on the ability of researchers to share data, methodologies, and findings in an open setting. If empirical value is to be from the analysis of retrospective data, groups must continuously work together on similar problems to create more effective peer review. This will lead to improvement in methodology and quality, with each iteration of analysis resulting in more reliability.

Keywords: big data; open data; unreliable research; machine learning; knowledge creation

Introduction

Failure to store, analyze, and utilize the vast amount of data generated during clinical care has restricted both quality of care and advances in the practice of medicine. Other industries, such as finance and energy, have already embraced data analytics for the purpose of learning. While such innovations remain relatively limited in the clinical domain, interest in “big data in clinical care” has dramatically increased. This is due partly to the widespread adoption of electronic medical record (EMR) systems and partly to the growing awareness that better data analytics are required to manage the complex enterprise of the health care system. For the most part, however, the clinical enterprise has not had to address the problems particular to “big data” because it has not yet satisfactorily addressed more fundamental data management issues. It is now becoming apparent that we are on the cusp of a great transformation that will incorporate data and data science integrally within the health care domain. In addition to the necessary major digital enhancements of the retrospective analyses that have variably been in place, real time and predictive analytics will also become ubiquitous core functionalities in the more firmly data-based environment of the (near) future. The initial Massachusetts Institute of Technology (MIT) Critical Data Conference was conceived and conducted to address the many data issues involved in this important transformation.[1][2]

Increasing interest in creating the clinical analog of “business intelligence” has made evident the necessity of developing and nurturing a clinical culture that can manage and translate data-based findings, including those from “big data” studies. Combining this improved secondary use of clinical data with a data-driven approach to learning will enable this new culture to close the clinical data feedback loop facilitating better and more personalized care. Authors have noted several hallmarks of “big data”: very large datasets, a large number of unrelated and/or unstructured datasets, or high speed or low latency of data creation.[3][4] The intensive care unit (ICU) provides a potent example of a particularly data rich clinical domain with the potential for both clinical and financial benefits if these large amounts of data can be harnessed and systematically leveraged into guiding practice. Thus, we use the term “Critical Data” to refer to big data in the setting of the ICU.

This paper summarizes the lectures and group discussions that took place during the recent Critical Data Conference at MIT, Cambridge MA, on January 7, 2014. The conference was the second part of a two-part event that brought together clinicians, data scientists, statisticians and epidemiologists.

The event opened with a “data marathon” on January 3-5, 2014 (Figure 1), which brought together teams of data scientists and clinicians to mine the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) database (version II). MIMIC II is an open-access database consisting of over 60,000 recorded ICU stays from the adult intensive care units at the Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA.[5] Over 100 people participated in the two-day data marathon, and posters of the projects were displayed at the Critical Data Conference.

The Critical Data Conference on January 7 was an approximately ten-hour program comprising two keynote addresses (Jeffrey Drazen, MD and John Ioannidis, MD, PhD), seven individual lectures, three panel discussions, and two poster sessions (Figure 2). The overall conference theme was meaningful secondary use of big data from critical care settings. Materials from the conference (program, slides, and videos) are available online at the MIT Critical Data conference site.[6]

Fig1 Celi JMIRMedInformatics2014 2-2.jpg

Figure 1. Presentation at the Critical Data Marathon. Photo credit: Andrew Zimolzak.

Fig2 Celi JMIRMedInformatics2014 2-2.jpg

Figure 2. Critical Data poster session. Photo credit: Andrew Zimolzak.

The problem

In his keynote address, Jeffrey Drazen, MD, Editor-in-Chief of the New England Journal of Medicine, noted that the number of evidence-based recommendations built on randomized controlled trials (RCTs), the current gold standard for data quality, is insufficient to address the majority of clinical decisions. Subsequently, clinicians are often left to practice medicine “blindly.” Without the knowledge generation required to capture the decisional factors involved in realistic clinical scenarios, clinical decision making is often less data-driven than determined by the “play of chance” buttressed by past experience. Historically, a doctor took a history, performed a physical examination and made a diagnosis based on what he or she observed. As technology and medical theory progressed, knowledge such as laboratory and imaging modalities helped mitigate chance in the diagnosis of disease. Rote application of existing knowledge is not enough, as physicians want to establish causality. Until now this has been done with theories, but moving forward, theories will be inadequate unless they are confirmed, translated to practice, and systematically disseminated in clinical practice.

This trial-and-error process continues today because data generated from routine care is most often not captured and is rarely disseminated for the purpose of improving population health. Even in information-rich care settings like the ICU, the knowledge necessary to mitigate the play of chance is lacking.[7][8] As such, the ICU provides a fertile ground for potential improvement. Specifically, Drazen suggested a potential role for clinical data mining to answer questions that cannot be answered using RCTs.[9] This approach would likely yield benefits both more quickly and with fewer resources.

Drazen concluded with the question “At what point is data good enough?” Documented associations may be strong but not sufficiently “proven” to establish causality. Drazen drew a comparison with experimental physicists who hone future studies on the work of theorists as well as on prior experimental results: biomedical informaticians can identify meaningful associations that can then guide design of new RCTs where data quality can be increased by controlling for potential confounders. This will require cross-disciplinary collaboration of frontline clinicians, medical staff, database engineers and biomedical informaticians, in addition to strong partnerships with health information system vendors in order to close the loop from knowledge discovery during routine care to the real time application of best care for populations.


References

  1. Celi, L.A.; Mark, R.G.; Stone, D.J.; Montgomery, R.A. (June 2013). ""Big data" in the intensive care unit: Closing the data loop". American Journal of Respiratory and Critical Care Medicine 187 (11): 1157–1160. doi:10.1164/rccm.201212-2311ED. PMC PMC3734609. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3734609. 
  2. Grande, D.; Mitra, N.; Shah, A.; Wan, F.; Asch, D.A. (October 2013). "Public preferences about secondary uses of electronic health information". JAMA Internal Medicine 173 (19): 1798–1806. doi:10.1001/jamainternmed.2013.9166. PMC PMC4083587. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083587. 
  3. McAfee, A.; Brynjolfsson, E. (October 2012). "Big data: the management revolution". Harvard Business Review 90 (10): 60–6, 68, 128. PMID 23074865. 
  4. Bourne, P.E. (2014). "What Big Data means to me". Journal of the American Medical Informatics Association 21 (2): 194. doi:10.1136/amiajnl-2014-002651. PMC PMC3932474. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3932474. 
  5. "The MIMIC II project". Massachusetts Institute of Technology. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkH0iGr1. Retrieved 11 August 2014. 
  6. "Critical data: Empowering big data in critical care". Massachusetts Institute of Technology. Archived from the original on 20 August 2014. http://www.webcitation.org/6Rxk41BBR. Retrieved 20 August 2014. 
  7. Freedman, D.H. (4 October 2010). "Lies, damned lies, and medical science". The Atlantic. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkHK88k4. Retrieved 11 August 2014. 
  8. "Trouble at the lab". The Economist. 19 October 2013. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkHeZm6r. Retrieved 11 August 2014. 
  9. Moses, C.; Celi, L.A.; Marshall, J. (June 2013). "Pharmacovigilance: an active surveillance system to proactively identify risks for adverse events". Population Health Management 16 (3): 147–149. doi:10.1089/pop.2012.0100. PMID 23530466. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation.

Per the distribution agreement, the following copyright information is also being added:

©Omar Badawi, Thomas Brennan, Leo Anthony Celi, Mengling Feng, Marzyeh Ghassemi, Andrea Ippolito, Alistair Johnson, Roger G Mark, Louis Mayaud, George Moody, Christopher Moses, Tristan Naumann, Vipan Nikore, Marco Pimentel, Tom J Pollard, Mauro Santos, David J Stone, Andrew Zimolzak, MIT Critical Data Conference 2014 Organizing Committee. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 22.08.2014.