Difference between revisions of "Journal:Making big data useful for health care: A summary of the inaugural MIT Critical Data Conference"

From LIMSWiki
Jump to navigationJump to search
(Added content. Saving and adding more.)
 
(Added content. Saving and adding more.)
Line 50: Line 50:
Drazen concluded with the question “At what point is data good enough?” Documented associations may be strong but not sufficiently “proven” to establish causality. Drazen drew a comparison with experimental physicists who hone future studies on the work of theorists as well as on prior experimental results: [[Bioinformatics|biomedical informaticians]] can identify meaningful associations that can then guide design of new RCTs where data quality can be increased by controlling for potential confounders. This will require cross-disciplinary collaboration of frontline clinicians, medical staff, database engineers and biomedical informaticians, in addition to strong partnerships with health information system vendors in order to close the loop from knowledge discovery during routine care to the real time application of best care for populations.
Drazen concluded with the question “At what point is data good enough?” Documented associations may be strong but not sufficiently “proven” to establish causality. Drazen drew a comparison with experimental physicists who hone future studies on the work of theorists as well as on prior experimental results: [[Bioinformatics|biomedical informaticians]] can identify meaningful associations that can then guide design of new RCTs where data quality can be increased by controlling for potential confounders. This will require cross-disciplinary collaboration of frontline clinicians, medical staff, database engineers and biomedical informaticians, in addition to strong partnerships with health information system vendors in order to close the loop from knowledge discovery during routine care to the real time application of best care for populations.


==Secondary usage of clinical data==
Charles Safran, MD, MS, Chief of the Division of Clinical Computing at the BIDMC and Harvard Medical School, spoke next, sharing the dream of evidence-based medicine (EBM): the ideal situation in which quality evidence would exist to guide clinicians through all the conundrums faced on a near-daily basis (eg, which test to order, how to interpret the test results, and what therapy to institute). For the last half-century, prospective RCTs have been the gold standard in EBM. Safran noted, as Drazen had, that such trials suffer from a number of limitations including economic burdens and design limitations. An RCT can only address a severely limited bundle of particularly well-posed clinical questions. For many clinical situations, it is either unethical or even impossible to proceed with an RCT. Furthermore, the inclusion and exclusion criteria often limit the generalizability of an RCT study and, given the time it normally takes to run an RCT, it is very difficult for these studies to remain current with the rapidly evolving practice of medicine.


Can another approach avoid at least some of the limitations of RCTs? Safran suggested that retrospective observational studies (ROS) utilizing EMR data are a promising avenue for generating EBM. Digital records contain extensive clinical information including medical history, diagnoses, medications, immunization dates, allergies, radiology images, and laboratory and test results. Consequently, routinely collected EMR data contains the rich, continuous and time-sensitive information needed to support clinical decision making and evidence generation.<ref name="SafranExpert07">{{cite journal |title=Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper |journal=Journal of the American Medical Informatics Association |author=Safran, C.; Bloomrosen, M.; Hammond, W.E.; Labkoff, S.; Markel-Fox, S.; Tang, P.C. |volume=14 |issue=1 |pages=1–9 |date=January 2007 |doi=10.1197/jamia.M2273 |pmc=PMC2329823}}</ref> However, despite the many potential benefits, Safran pointed out that secondary use of EMR data is still subject to limitations: EMR data were not collected primarily for the purpose of evidence generation and data analytics but for real-time and longitudinal patient care.<ref name="GeissbuhlerTrust13">{{cite journal |title=Trustworthy reuse of health data: a transnational perspective |journal=International Journal of Medical Informatics |author=Geissbuhler, A.; Safran, C.; Buchan, I.; Bellazzi, R.; Labkoff, S.; Eilenberg, K.; et al. |volume=82 |issue=1 |pages=1–9 |date=January 2013 |doi=10.1016/j.ijmedinf.2012.11.003 |pmid=23182430}}</ref> As a result, EMR data are often poorly structured, disorganized, unstandardized, and contaminated with errors, artifacts, and missing values.
Safran echoed one of Drazen’s points by proposing that we should combine the usage of prospective RCTs and ROS in such a way that each complements the limitations of the other. Furthermore, he suggested the possibility of incorporating additional novel sources of data such as social media data, health data from portable sensors, and genetic data. While there are many barriers to establishing such a comprehensive framework, a big data picture of clinical, genetic, and treatment variables holds promise in revolutionizing diagnosis and treatment.
==Connecting patients, providers, and payers==
For John Halamka, MD, MS, the Chief Information Officer of BIDMC, working with big data in [[hospital]] systems is hugely challenging but at the same time holds tremendous promise in providing more meaningful information to help clinicians treat patients across the continuum of care. In his position, Halamka has been tasked to aggregate data in novel ways in order to provide better care for BIDMC’s patient population. One opportunity for furthering “big data in health care” is to normalize the data collected via their EMR system and store it in large, centralized databases. In turn, analytic tools can then be applied to identify and isolate the quality data reporting measures required to participate as an Accountable Care Organization (ACO) under the Affordable Care Act.
Halamka emphasized that building these large datasets does not intrinsically provide value from the start, stating that “workflow is disparate, the vocabulary is disparate, and the people are disparate.” Therefore, the normalization of data and its distillation into standard schemas are difficult due to discrepancies across longitudinal data. Further, since each vendor models concepts differently, there must be an emphasis on developing a “least common denominator” concept map across vendors’ offerings.
Nevertheless, through this normalization effort, doctors can utilize “scorecards” to evaluate their own patient population within and across the different payment models, such as Blue Cross Blue Shield’s Alternative Quality Contact measures, the [[Centers for Medicare and Medicaid]] (CMS) Physician Quality Reporting System measures, and the CMS ACO measures. In addition, physicians can query this dataset to identify the most effective treatment regimes. However, such queries do pose privacy and security issues in the hospital setting, and these risks are further complicated by hospital staff utilizing personal mobile devices such as cell phones, laptops, and tablets.
==Creating a data-driven learning system==
The problem posed to the first panel (Figure 3), comprising Gari Clifford, PhD, Perren Cobb, MD, and Joseph Frassica, MD, and moderated by Leo Anthony Celi, MD, MS, MPH, was how to create a data-driven learning system in clinical practice.<ref name="EconTrouble13" /> Privacy concerns were cited as the central barrier, as there is a tradeoff between re-identification risk and the value of sharing. Furthermore, recent work shows patients are reluctant to share for certain purposes such as marketing, pharmaceutical, and quality improvement measures, indicating a need for public education about the benefits of data sharing and that shared data can be utilized without being used for marketing and other unwanted purposes.<ref name="GrandePublic13" />
There is also tension between intellectual property rights and transparency. Resolution of this may require collaboration between government, industry and academic institutions, as seen with the US Critical Illness and Injury Trials Group.<ref name="CobbUS09">{{cite journal |title=The United States critical illness and injury trials group: an introduction |journal=The Journal of Trauma |author=Cobb, J.P.; Cairns, C.B.; Bulger, E.; Wong, H.R.; Parsons, P.E.; Angus, D.C.; et al. |volume=67 |issue=2 Suppl |pages=S159–S160 |date=August 2009 |doi=10.1097/TA.0b013e3181ad3473 |pmid=19667851}}</ref> There is also a risk that data sharing will make authors reluctant to write audacious or unconventional papers (as did Reinhart and Rogoff<ref name="ReinhartDecade11">{{cite web |url=http://www.nber.org/papers/w16827.pdf |format=PDF |title=A Decade of Debt - Working Paper 16827 |author=Reinhart, C.M.; Rogoff, K.S. |publisher=National Bureau of Economic Research |date=2011 |archiveurl=http://www.webcitation.org/6RkHujrwl |archivedate=11 August 2014 |accessdate=11 August 2014}}</ref>), if data sharing puts such papers at perceived higher risk of refutation (such as the refutation of Herndon et al<ref name="HerndonDoes11">{{cite web |url=http://www.peri.umass.edu/fileadmin/pdf/working_papers/working_papers_301-350/WP322.pdf |format=PDF |title=Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff - Working Paper Series No. 322 |author=Herndon, T.; Ash, M.; Pollin, R. |publisher=Political Economy Research Institute |date=2013 |archiveurl=http://www.webcitation.org/6RkI0b5Rf |archivedate=11 August 2014 |accessdate=11 August 2014}}</ref>).
Finally, the panel raised concerns about the high quantity but perceived low quality of the data that is actually captured. While there is hope that automatically captured data may be more accurate than manually entered data, there is also some risk that doing so will introduce additional noise, furthering the problem of quantity over quality. This concern poses the challenge of capturing more and higher quality data in order to promote reproducibility. Panelists observed that multidisciplinary conferences like the Critical Data Conference are especially beneficial in this regard, as they provide an opportunity for clinicians and data scientists to better understand the relation between real-world activity and the data that such activity generates.
[[File:Fig3 Celi JMIRMedInformatics2014 2-2.jpg|660px]]
{{clear}}
<blockquote>'''Figure 3.''' Data-driven learning system panel. Photo credit: Andrew Zimolzak.</blockquote>
==Physician culture as a barrier to spread of innovation==
In the following panel moderated by critical care physician Leo Anthony Celi, MD, MS, MPH, fellow intensivists Djillali Annane, MD, PhD, Peter Clardy, MD and Taylor Thompson, MD reflected on the barriers presented by the current clinician culture toward the goal of data-driven innovation in medicine (Figure 4). The panelists observed that historically, EBM was perceived to be incompatible with well-established observational trials and experience, perhaps instilling a residual degree of resistance. Consequently, echoing Safran’s sentiments, it will be increasingly important that “big data” is understood as a complement to RCTs and (patho)physiologic studies. Furthermore, condensing and filtering the vast quantity of data to make it applicable at the bedside will be key to adoption. The specific inclusion of clinicians during the design process will help to deter the creation of tools that inundate staff with extraneous information and burdensome extra tasks. Likewise the incorporation of “big data” into medical education, in a way that students and resident trainees will be able to understand its importance in both everyday care and expediting research, is vital.
While the panel agreed that more evidence is required to determine whether big data can facilitate comparative effectiveness research, it was acknowledged that it is necessary to investigate this alternative since RCTs do not, and will not, provide answers to an important fraction of the decisions required on a daily basis. Scaling up RCTs to account for the thousands of decisions each day is not feasible, so big data approaches may provide the most effective way to fill these gaps. For example, three groups currently leading clinical trials research in the analysis of fluid resuscitation in critically ill patients have collaborated to create a common database architecture to allow for individual patient meta-analysis and for these trials to be evaluated in aggregate by an external monitoring committee.
[[File:Fig4 Celi JMIRMedInformatics2014 2-2.jpg|660px]]
{{clear}}
<blockquote>'''Figure 4.''' Physician culture panel. Photo credit: Andrew Zimolzak.</blockquote>


==References==
==References==

Revision as of 20:34, 24 July 2015

Full article title Making big data useful for health care: A summary of the inaugural MIT Critical Data Conference
Journal JMIR Medical Informatics
Author(s) Badawi, O.; Brennan, T.; Celi, L.A.; Feng, M.; Ghassemi, M.; Ippolito, A.; Johnson, A.; Mark, R.G.; Mayaud, L.; Moody, G.;
Moses, C.; Naumann, T.; Nikore, V.; Pimentel, M.; Pollard, T.J.; Santos, M.; Stone, D.J.; Zimolzak, A.
Author affiliation(s) Massachusetts Institute of Technology
Primary contact Leo Anthony Celi, MD, MPH, MS - Phone: 1.617.253.7937; Email: lceli@mit.edu
Year published 2014
Volume and issue 2 (2)
Page(s) e22
DOI 10.2196/medinform.3447
ISSN 2291-9694
Distribution license Creative Commons Attribution 2.0
Website http://medinform.jmir.org/2014/2/e22/

Abstract

With growing concerns that big data will only augment the problem of unreliable research, the Laboratory of Computational Physiology at the Massachusetts Institute of Technology organized the Critical Data Conference in January 2014. Thought leaders from academia, government, and industry across disciplines — including clinical medicine, computer science, public health, informatics, biomedical research, health technology, statistics, and epidemiology — gathered and discussed the pitfalls and challenges of big data in health care. The key message from the conference is that the value of large amounts of data hinges on the ability of researchers to share data, methodologies, and findings in an open setting. If empirical value is to be from the analysis of retrospective data, groups must continuously work together on similar problems to create more effective peer review. This will lead to improvement in methodology and quality, with each iteration of analysis resulting in more reliability.

Keywords: big data; open data; unreliable research; machine learning; knowledge creation

Introduction

Failure to store, analyze, and utilize the vast amount of data generated during clinical care has restricted both quality of care and advances in the practice of medicine. Other industries, such as finance and energy, have already embraced data analytics for the purpose of learning. While such innovations remain relatively limited in the clinical domain, interest in “big data in clinical care” has dramatically increased. This is due partly to the widespread adoption of electronic medical record (EMR) systems and partly to the growing awareness that better data analytics are required to manage the complex enterprise of the health care system. For the most part, however, the clinical enterprise has not had to address the problems particular to “big data” because it has not yet satisfactorily addressed more fundamental data management issues. It is now becoming apparent that we are on the cusp of a great transformation that will incorporate data and data science integrally within the health care domain. In addition to the necessary major digital enhancements of the retrospective analyses that have variably been in place, real time and predictive analytics will also become ubiquitous core functionalities in the more firmly data-based environment of the (near) future. The initial Massachusetts Institute of Technology (MIT) Critical Data Conference was conceived and conducted to address the many data issues involved in this important transformation.[1][2]

Increasing interest in creating the clinical analog of “business intelligence” has made evident the necessity of developing and nurturing a clinical culture that can manage and translate data-based findings, including those from “big data” studies. Combining this improved secondary use of clinical data with a data-driven approach to learning will enable this new culture to close the clinical data feedback loop facilitating better and more personalized care. Authors have noted several hallmarks of “big data”: very large datasets, a large number of unrelated and/or unstructured datasets, or high speed or low latency of data creation.[3][4] The intensive care unit (ICU) provides a potent example of a particularly data rich clinical domain with the potential for both clinical and financial benefits if these large amounts of data can be harnessed and systematically leveraged into guiding practice. Thus, we use the term “Critical Data” to refer to big data in the setting of the ICU.

This paper summarizes the lectures and group discussions that took place during the recent Critical Data Conference at MIT, Cambridge MA, on January 7, 2014. The conference was the second part of a two-part event that brought together clinicians, data scientists, statisticians and epidemiologists.

The event opened with a “data marathon” on January 3-5, 2014 (Figure 1), which brought together teams of data scientists and clinicians to mine the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) database (version II). MIMIC II is an open-access database consisting of over 60,000 recorded ICU stays from the adult intensive care units at the Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA.[5] Over 100 people participated in the two-day data marathon, and posters of the projects were displayed at the Critical Data Conference.

The Critical Data Conference on January 7 was an approximately ten-hour program comprising two keynote addresses (Jeffrey Drazen, MD and John Ioannidis, MD, PhD), seven individual lectures, three panel discussions, and two poster sessions (Figure 2). The overall conference theme was meaningful secondary use of big data from critical care settings. Materials from the conference (program, slides, and videos) are available online at the MIT Critical Data conference site.[6]

Fig1 Celi JMIRMedInformatics2014 2-2.jpg

Figure 1. Presentation at the Critical Data Marathon. Photo credit: Andrew Zimolzak.

Fig2 Celi JMIRMedInformatics2014 2-2.jpg

Figure 2. Critical Data poster session. Photo credit: Andrew Zimolzak.

The problem

In his keynote address, Jeffrey Drazen, MD, Editor-in-Chief of the New England Journal of Medicine, noted that the number of evidence-based recommendations built on randomized controlled trials (RCTs), the current gold standard for data quality, is insufficient to address the majority of clinical decisions. Subsequently, clinicians are often left to practice medicine “blindly.” Without the knowledge generation required to capture the decisional factors involved in realistic clinical scenarios, clinical decision making is often less data-driven than determined by the “play of chance” buttressed by past experience. Historically, a doctor took a history, performed a physical examination and made a diagnosis based on what he or she observed. As technology and medical theory progressed, knowledge such as laboratory and imaging modalities helped mitigate chance in the diagnosis of disease. Rote application of existing knowledge is not enough, as physicians want to establish causality. Until now this has been done with theories, but moving forward, theories will be inadequate unless they are confirmed, translated to practice, and systematically disseminated in clinical practice.

This trial-and-error process continues today because data generated from routine care is most often not captured and is rarely disseminated for the purpose of improving population health. Even in information-rich care settings like the ICU, the knowledge necessary to mitigate the play of chance is lacking.[7][8] As such, the ICU provides a fertile ground for potential improvement. Specifically, Drazen suggested a potential role for clinical data mining to answer questions that cannot be answered using RCTs.[9] This approach would likely yield benefits both more quickly and with fewer resources.

Drazen concluded with the question “At what point is data good enough?” Documented associations may be strong but not sufficiently “proven” to establish causality. Drazen drew a comparison with experimental physicists who hone future studies on the work of theorists as well as on prior experimental results: biomedical informaticians can identify meaningful associations that can then guide design of new RCTs where data quality can be increased by controlling for potential confounders. This will require cross-disciplinary collaboration of frontline clinicians, medical staff, database engineers and biomedical informaticians, in addition to strong partnerships with health information system vendors in order to close the loop from knowledge discovery during routine care to the real time application of best care for populations.

Secondary usage of clinical data

Charles Safran, MD, MS, Chief of the Division of Clinical Computing at the BIDMC and Harvard Medical School, spoke next, sharing the dream of evidence-based medicine (EBM): the ideal situation in which quality evidence would exist to guide clinicians through all the conundrums faced on a near-daily basis (eg, which test to order, how to interpret the test results, and what therapy to institute). For the last half-century, prospective RCTs have been the gold standard in EBM. Safran noted, as Drazen had, that such trials suffer from a number of limitations including economic burdens and design limitations. An RCT can only address a severely limited bundle of particularly well-posed clinical questions. For many clinical situations, it is either unethical or even impossible to proceed with an RCT. Furthermore, the inclusion and exclusion criteria often limit the generalizability of an RCT study and, given the time it normally takes to run an RCT, it is very difficult for these studies to remain current with the rapidly evolving practice of medicine.

Can another approach avoid at least some of the limitations of RCTs? Safran suggested that retrospective observational studies (ROS) utilizing EMR data are a promising avenue for generating EBM. Digital records contain extensive clinical information including medical history, diagnoses, medications, immunization dates, allergies, radiology images, and laboratory and test results. Consequently, routinely collected EMR data contains the rich, continuous and time-sensitive information needed to support clinical decision making and evidence generation.[10] However, despite the many potential benefits, Safran pointed out that secondary use of EMR data is still subject to limitations: EMR data were not collected primarily for the purpose of evidence generation and data analytics but for real-time and longitudinal patient care.[11] As a result, EMR data are often poorly structured, disorganized, unstandardized, and contaminated with errors, artifacts, and missing values.

Safran echoed one of Drazen’s points by proposing that we should combine the usage of prospective RCTs and ROS in such a way that each complements the limitations of the other. Furthermore, he suggested the possibility of incorporating additional novel sources of data such as social media data, health data from portable sensors, and genetic data. While there are many barriers to establishing such a comprehensive framework, a big data picture of clinical, genetic, and treatment variables holds promise in revolutionizing diagnosis and treatment.

Connecting patients, providers, and payers

For John Halamka, MD, MS, the Chief Information Officer of BIDMC, working with big data in hospital systems is hugely challenging but at the same time holds tremendous promise in providing more meaningful information to help clinicians treat patients across the continuum of care. In his position, Halamka has been tasked to aggregate data in novel ways in order to provide better care for BIDMC’s patient population. One opportunity for furthering “big data in health care” is to normalize the data collected via their EMR system and store it in large, centralized databases. In turn, analytic tools can then be applied to identify and isolate the quality data reporting measures required to participate as an Accountable Care Organization (ACO) under the Affordable Care Act.

Halamka emphasized that building these large datasets does not intrinsically provide value from the start, stating that “workflow is disparate, the vocabulary is disparate, and the people are disparate.” Therefore, the normalization of data and its distillation into standard schemas are difficult due to discrepancies across longitudinal data. Further, since each vendor models concepts differently, there must be an emphasis on developing a “least common denominator” concept map across vendors’ offerings.

Nevertheless, through this normalization effort, doctors can utilize “scorecards” to evaluate their own patient population within and across the different payment models, such as Blue Cross Blue Shield’s Alternative Quality Contact measures, the Centers for Medicare and Medicaid (CMS) Physician Quality Reporting System measures, and the CMS ACO measures. In addition, physicians can query this dataset to identify the most effective treatment regimes. However, such queries do pose privacy and security issues in the hospital setting, and these risks are further complicated by hospital staff utilizing personal mobile devices such as cell phones, laptops, and tablets.

Creating a data-driven learning system

The problem posed to the first panel (Figure 3), comprising Gari Clifford, PhD, Perren Cobb, MD, and Joseph Frassica, MD, and moderated by Leo Anthony Celi, MD, MS, MPH, was how to create a data-driven learning system in clinical practice.[8] Privacy concerns were cited as the central barrier, as there is a tradeoff between re-identification risk and the value of sharing. Furthermore, recent work shows patients are reluctant to share for certain purposes such as marketing, pharmaceutical, and quality improvement measures, indicating a need for public education about the benefits of data sharing and that shared data can be utilized without being used for marketing and other unwanted purposes.[2]

There is also tension between intellectual property rights and transparency. Resolution of this may require collaboration between government, industry and academic institutions, as seen with the US Critical Illness and Injury Trials Group.[12] There is also a risk that data sharing will make authors reluctant to write audacious or unconventional papers (as did Reinhart and Rogoff[13]), if data sharing puts such papers at perceived higher risk of refutation (such as the refutation of Herndon et al[14]).

Finally, the panel raised concerns about the high quantity but perceived low quality of the data that is actually captured. While there is hope that automatically captured data may be more accurate than manually entered data, there is also some risk that doing so will introduce additional noise, furthering the problem of quantity over quality. This concern poses the challenge of capturing more and higher quality data in order to promote reproducibility. Panelists observed that multidisciplinary conferences like the Critical Data Conference are especially beneficial in this regard, as they provide an opportunity for clinicians and data scientists to better understand the relation between real-world activity and the data that such activity generates.

Fig3 Celi JMIRMedInformatics2014 2-2.jpg

Figure 3. Data-driven learning system panel. Photo credit: Andrew Zimolzak.

Physician culture as a barrier to spread of innovation

In the following panel moderated by critical care physician Leo Anthony Celi, MD, MS, MPH, fellow intensivists Djillali Annane, MD, PhD, Peter Clardy, MD and Taylor Thompson, MD reflected on the barriers presented by the current clinician culture toward the goal of data-driven innovation in medicine (Figure 4). The panelists observed that historically, EBM was perceived to be incompatible with well-established observational trials and experience, perhaps instilling a residual degree of resistance. Consequently, echoing Safran’s sentiments, it will be increasingly important that “big data” is understood as a complement to RCTs and (patho)physiologic studies. Furthermore, condensing and filtering the vast quantity of data to make it applicable at the bedside will be key to adoption. The specific inclusion of clinicians during the design process will help to deter the creation of tools that inundate staff with extraneous information and burdensome extra tasks. Likewise the incorporation of “big data” into medical education, in a way that students and resident trainees will be able to understand its importance in both everyday care and expediting research, is vital.

While the panel agreed that more evidence is required to determine whether big data can facilitate comparative effectiveness research, it was acknowledged that it is necessary to investigate this alternative since RCTs do not, and will not, provide answers to an important fraction of the decisions required on a daily basis. Scaling up RCTs to account for the thousands of decisions each day is not feasible, so big data approaches may provide the most effective way to fill these gaps. For example, three groups currently leading clinical trials research in the analysis of fluid resuscitation in critically ill patients have collaborated to create a common database architecture to allow for individual patient meta-analysis and for these trials to be evaluated in aggregate by an external monitoring committee.

Fig4 Celi JMIRMedInformatics2014 2-2.jpg

Figure 4. Physician culture panel. Photo credit: Andrew Zimolzak.

References

  1. Celi, L.A.; Mark, R.G.; Stone, D.J.; Montgomery, R.A. (June 2013). ""Big data" in the intensive care unit: Closing the data loop". American Journal of Respiratory and Critical Care Medicine 187 (11): 1157–1160. doi:10.1164/rccm.201212-2311ED. PMC PMC3734609. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3734609. 
  2. 2.0 2.1 Grande, D.; Mitra, N.; Shah, A.; Wan, F.; Asch, D.A. (October 2013). "Public preferences about secondary uses of electronic health information". JAMA Internal Medicine 173 (19): 1798–1806. doi:10.1001/jamainternmed.2013.9166. PMC PMC4083587. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083587. 
  3. McAfee, A.; Brynjolfsson, E. (October 2012). "Big data: the management revolution". Harvard Business Review 90 (10): 60–6, 68, 128. PMID 23074865. 
  4. Bourne, P.E. (2014). "What Big Data means to me". Journal of the American Medical Informatics Association 21 (2): 194. doi:10.1136/amiajnl-2014-002651. PMC PMC3932474. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3932474. 
  5. "The MIMIC II project". Massachusetts Institute of Technology. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkH0iGr1. Retrieved 11 August 2014. 
  6. "Critical data: Empowering big data in critical care". Massachusetts Institute of Technology. Archived from the original on 20 August 2014. http://www.webcitation.org/6Rxk41BBR. Retrieved 20 August 2014. 
  7. Freedman, D.H. (4 October 2010). "Lies, damned lies, and medical science". The Atlantic. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkHK88k4. Retrieved 11 August 2014. 
  8. 8.0 8.1 "Trouble at the lab". The Economist. 19 October 2013. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkHeZm6r. Retrieved 11 August 2014. 
  9. Moses, C.; Celi, L.A.; Marshall, J. (June 2013). "Pharmacovigilance: an active surveillance system to proactively identify risks for adverse events". Population Health Management 16 (3): 147–149. doi:10.1089/pop.2012.0100. PMID 23530466. 
  10. Safran, C.; Bloomrosen, M.; Hammond, W.E.; Labkoff, S.; Markel-Fox, S.; Tang, P.C. (January 2007). "Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper". Journal of the American Medical Informatics Association 14 (1): 1–9. doi:10.1197/jamia.M2273. PMC PMC2329823. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2329823. 
  11. Geissbuhler, A.; Safran, C.; Buchan, I.; Bellazzi, R.; Labkoff, S.; Eilenberg, K.; et al. (January 2013). "Trustworthy reuse of health data: a transnational perspective". International Journal of Medical Informatics 82 (1): 1–9. doi:10.1016/j.ijmedinf.2012.11.003. PMID 23182430. 
  12. Cobb, J.P.; Cairns, C.B.; Bulger, E.; Wong, H.R.; Parsons, P.E.; Angus, D.C.; et al. (August 2009). "The United States critical illness and injury trials group: an introduction". The Journal of Trauma 67 (2 Suppl): S159–S160. doi:10.1097/TA.0b013e3181ad3473. PMID 19667851. 
  13. Reinhart, C.M.; Rogoff, K.S. (2011). "A Decade of Debt - Working Paper 16827" (PDF). National Bureau of Economic Research. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkHujrwl. Retrieved 11 August 2014. 
  14. Herndon, T.; Ash, M.; Pollin, R. (2013). "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff - Working Paper Series No. 322" (PDF). Political Economy Research Institute. Archived from the original on 11 August 2014. http://www.webcitation.org/6RkI0b5Rf. Retrieved 11 August 2014. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation.

Per the distribution agreement, the following copyright information is also being added:

©Omar Badawi, Thomas Brennan, Leo Anthony Celi, Mengling Feng, Marzyeh Ghassemi, Andrea Ippolito, Alistair Johnson, Roger G Mark, Louis Mayaud, George Moody, Christopher Moses, Tristan Naumann, Vipan Nikore, Marco Pimentel, Tom J Pollard, Mauro Santos, David J Stone, Andrew Zimolzak, MIT Critical Data Conference 2014 Organizing Committee. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 22.08.2014.