Journal:Challenges and opportunities of big data in health care: A systematic review

From LIMSWiki
Revision as of 22:47, 6 December 2016 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Challenges and opportunities of big data in health care: A systematic review
Journal JMIR Medical Informatics
Author(s) Kruse, Clemens S.; Goswamy, Rishi; Raval, Yesha; Marawi, Sarah
Author affiliation(s) School of Health Administration, Texas State University
Primary contact Email: scottkruse [at] txstate.edu; Phone: 1 2103554742
Editors Eysenbach, G.
Year published 2016
Volume and issue 4 (4)
Page(s) e38
DOI 10.2196/medinform.5359
ISSN 2291-9694
Distribution license Creative Commons Attribution 2.0
Website http://medinform.jmir.org/2016/4/e38/
Download http://medinform.jmir.org/2016/4/e38/pdf (PDF)

Abstract

Background: Big data analytics offers promise in many business sectors, and health care is looking at big data to provide answers to many age-related issues, particularly dementia and chronic disease management.

Objective: The purpose of this review was to summarize the challenges faced by big data analytics and the opportunities that big data opens in health care.

Methods: A total of three searches were performed for publications between January 1, 2010 and January 1, 2016 (PubMed/MEDLINE, CINAHL, and Google Scholar), and an assessment was made on content germane to big data in health care. From the results of the searches in research databases and Google Scholar (N=28), the authors summarized content and identified nine and 14 themes under the categories "Challenges" and "Opportunities," respectively. We rank-ordered and analyzed the themes based on the frequency of occurrence.

Results: The top challenges were issues of data structure, security, data standardization, storage and transfers, and managerial skills such as data governance. The top opportunities revealed were quality improvement, population management and health, early detection of disease, data quality, structure, and accessibility, improved decision making, and cost reduction.

Conclusions: Big data analytics has the potential for positive impact and global implications; however, it must overcome some legitimate obstacles.

Keywords: big data, analytics, health care, human genome, electronic medical record

Introduction

Rationale

Big data analytics offers promise in many business sectors, and health care is looking at big data to provide answers to many age-related issues, particularly dementia and chronic disease management. This systematic review explores the depth of big data analytics since 2010 and identifies both challenges and opportunities associated with big data in health care. The review follows the standard set by Preferred Reporting Items for Systematic Reviews and Meta-analysis (2009).[1]

Big data is commonly defined through the four Vs: volume (scale or quantity of data), velocity (speed and analysis of real-time or near-real-time data), variety (different forms of data, often from disparate data sources), and veracity (quality assurance of the data). The first three Vs are found in most literature[2][3], and the fourth V is a goal.[4]

As of 2012, about 2.5 exabytes of data are created each day; Walmart can collect up to 2.5 petabytes of customer-related data per hour.[2] The industry of health care produces and collects data at a staggering speed, but different electronic health records (EHRs) collect data in different structures: structured, unstructured, and semistructured. This variety can pose difficulty when seeking veracity or quality assurance of the data. The EHRs can provide a rich source of data, ripe for analysis to increase our understanding of disease mechanisms, as well as better and personalized health care, but the data structures pose a problem to standard means of analysis.[5]

There are several large sources for big data in health care: genomics, EHR, medical monitoring devices, wearable video devices, and health-related mobile phone apps. Approximately 483 studies on genomics are registered with the U.S. Department of Health and Human Services; these studies are being conducted in nine countries, and they all use portions of the data from the Human Genome Project.[6] The EHR, being adopted in many countries, offers a source of data the depth of which is almost inconceivable. About 500 petabytes of data was generated by the EHR in 2012, and by 2020, the data will reach 25,000 petabytes.[7] The EHR can collect data from other monitoring devices, but the continuous data streams are not consistently saved in the longitudinal record.

The decrease in the cost of storage has enabled an exponential distribution of data collection, but the ability to analyze this quantity of data is the center of gravity for “big data” in health care. In the United States, financial incentives offered for the “meaningful use” of health information technology has spurred growth in the adoption of the EHR and other enabling health-related technology since 2009.

Health information systems show great potential in improving the efficiency in the delivery of care, a reduction in overall costs to the health care system, as well as a marked improvement in patient outcomes.[8] The U.S. government has allocated billions of dollars to help the country’s health care market realize some of these efficiencies and savings. Specific provisions of the Health Information Technology for Economic and Clinical Health (HITECH) Act, part of the American Recovery and Reinvestment Act, acknowledge the importance of information technology (IT) in the delivery of health care within the United States.[9] The Act allocates approximately US $17.2 billion in incentives for the adoption and meaningful use of health information technology, part of which involves the participation in the electronic exchange of clinical information. In 2010, the Congress passed the Health Information Exchange (HIE) Challenge Grant Program, which contributed about US $547.7 million to state HIE programs.[10]

With the implementation of this legislation as well as the technologies associated with it, it is imperative to effectively organize and process the ever-increasing quantity of data that is digitally collected and stored within health care organizations. Other industries such as astronomy, retail, search engines, and politics have developed advanced data-handling capabilities to convert data into knowledge. Health care needs to follow their lead so that decisions regarding organizational objectives and goals can be met.[4][11][12] This evolutionary process of data management is collectively known as big data, and it is essential to the future of adoption and management of health information technology.[13]

Objectives

The purpose of this systematic review is to objectively review articles and studies published in academic journals in order to compile a list of challenges and opportunities faced by big data analytics in health care in the United States. Particular emphasis will be paid to age-related applications of big data.

Methods

Eligibility criteria

Articles and studies were eligible for analysis if they were published between 2010 and 2015, published in academic journals, and published in English. The researchers chose a range from 2010 to 2015 for two reasons: HITECH was passed in 2009, and it appeared that a blossom of research and other articles seemed to occur in 2010. We focused on academic journals for their peer-review quality and to decrease the chance of selecting something about big data published from a non-credible source.

Information sources

A combination of key terms from Medical Subject Headings (MeSH) and Boolean operators were combined and used in two common research databases, CINAHL and PubMed, and combined with a general search from Google Scholar (see Figure 1) in January 2016.

These terms were chosen not only because they are the focus of the review, but also because they were identified in the initial research into the definition of big data.


Fig1 Kruse JMIRMedInfo2016 4-4.png

Fig. 1 Literature review process with inclusion and exclusion criteria

Search

The following search string was used in all three searches: ((“big data” AND healthcare) OR (“big data” AND “health care”)). This search string was used in CINAHL, PubMed (MEDLINE), and Google Scholar. In the two research databases, our team was able to restrict the search to academic journals (including other systematic reviews). MEDLINE was excluded in CINAHL because it was already captured in PubMed. Google Scholar creates difficulty for searches because of its severe limit of filters typically associated with academic research. The initial 13,935 results were limited by restricting dates to the last five years, limiting results to academic journals and MEDLINE, and in Google Scholar by restricting the keyword search to titles. The result from the filters ended with 121 articles to review.

Study selection

Through group research and a series of consensus meetings, researchers were trained to identify articles germane to this review and to recommend elimination of all others. A shared spreadsheet was used by the research team to parse through the list of articles. Researchers read all articles in their entirety. A total of 97 articles were eliminated due to various exclusion criteria (not germane to big data or health care, editorial only, not an academic journal, or duplicate from another search), and four additional articles were identified from the references of the 24 that remained. The group of reviewers made these rejections or additional recommendations through a series of consensus meetings where we met to discuss their recommendations and consensus was reached through discussion. A total of 28 articles remained in the final review.

Data collection process and identification of summary measures

Each article was reviewed by at least two authors to identify the relevant points. All reviewers used a spreadsheet template to summarize their key observations from each article. One team member combined the spreadsheets into one and shared it once again. Reviewers held one more consensus meeting to discuss their findings. From this meeting, trends were identified, and from those trends, inferences were made.

Additional analysis

From the list of observations, reviewers were able to identify some common threads that emerged as challenges and opportunities in health care that permeated multiple articles. Separate tables were created to group the threads, and from each of these tables, common themes were identified. These common themes only emerged when reviewers combined their observations. These themes were tabulated and counted for additional analysis.

Results

Study selection

As depicted in Figure 1, 935 articles resulted from the initial search. Filters such as "data published" (2010-2015), "academic journals," and "English language" were implemented to reduce the range to what was being studied. Reviewers agreed to eliminate editorials and focus on those articles that studied big data, as described in the Introduction section of this manuscript. At the end of the search process, only 28 remained. The articles reviewed for this study ranged from 2012 to 2015. The majority of the literature chosen for this paper was published in 2014 (15/28, 54%), and a minority was published in 2015 (2/28, 7%); the latter was most likely due to the early part of the year when the search was conducted.

Synthesis of results

Multiple reviewers read each article in its entirety. Articles were included or excluded based on the criteria illustrated in Figure 1. All articles included in the analysis were sorted by date and are listed in Multimedia Appendix 1.

A study catalog number was assigned to each article to simplify the analysis. Researchers summarized the main points of each article for further analysis.

Additional analysis

Through the combination of observations, reviewers identified common threads (challenges and opportunities) and themes from each thread. Themes were organized into affinity diagrams (Tables 1 and 2), compared, and discussed among researchers.

Challenges for big data in health care

Nine themes emerged under the category of challenges: data structure, security, data standardization, data storage and transfers, managerial issues such as governance and ownership, lack of skill of data analysts, inaccuracies in data, regulatory compliance, and real-time analytics. Examples for each theme are provided in Table 1. A total of 60 observations were made for challenges.

Table 1. Themes associated with challenges for big data in health care
Themes Examples Number of articles (n) Articles themes appeared in % of total articles (N=28)
Data structure Fragmented data, incompatible formats, heterogeneous data, raw and unstructured datasets, large volume, high variety and velocity, lack of transparency 17 1, 2, 7-9, 12, 14-19, 22, 25-28 61%
Security Privacy, confidentiality, data duplication, integrity 14 2, 4, 7-9, 12, 13, 17, 21, 22, 25-28 50%
Data standardization Limited interoperability, data acquisition and cleansing, global sharing, terminology, language barriers 11 4, 5, 7-9, 11, 12, 15, 16, 22, 25 39%
Storage and transfers Expensive to store; transfer from one place to other; store electronic data; securely extract, transmit, and process 8 1, 4, 7, 12, 22, 26, 28 28%
Managerial issues Governance issues, ownership issues 4 2, 8, 14, 22 14%
Lack of skill Untrained workers 3 5, 9, 14 11%
Inaccuracies Inconsistencies, lack of precision, data timeliness 1 9 4%
Regulatory compliance Legal concerns 1 13 4%
Real-time analytics Real-time analytics 1 9 4%

The four Vs appear in multiple places under the "Challenges" category. Volume and variety are seen by name under the theme of "Data structure." Variety is also implied in the same theme, but listed as "Incompatible formats," as well as "Raw and unstructured datasets." Variety can also be inferred from the theme of "Data standardization," listed as "Limited interoperability." Velocity is seen in the theme "Real-time analytics." Veracity is seen under the theme of "Data standardization" but listed as "Data acquisition and cleansing," "Terminology," and "Language barriers." It is also inferred in the theme "Inaccuracies" listed as "Inconsistencies" and "Lack of precision."

Data structure issues: Issues related to data structure were addressed in the majority of the papers reviewed for this study. It is essential that the key functions of data processing are supported by the applications of big data.[13] Big data applications should be user-friendly, transparent, and menu-driven.[13][14] The majority of data in health care is unstructured, such as from natural language processing.[12] It is often fragmented, dispersed, and rarely standardized. [12][13][15][16][17][18][19][20][21] It is no secret that the EHRs do not share well across organizational lines, but with unstructured data, even within the same organization, unstructured data is difficult to aggregate and analyze. It is no wonder that 61 percent of the articles analyzed listed this as a concern; big data analytics will need to address this significant challenge.

Research data within the health care sector is more heterogeneous than the research data produced within other research fields.[3][5][12] Data from both research and public health is often produced in large volumes.[15][22][23] Another structure-related issue results from the changing health care fee-for-service care model.[4] Finally, big data will need to address issues with the transparency of metadata.[16][24]

Security issues: There are considerable privacy concerns regarding the use of big data analytics, specifically in health care given the enactment of Health Insurance Portability and Accountability Act (HIPAA) legislation.[15] Data that is made available on open source is freely available and, hence, highly vulnerable.[12][13][18][20] Further, due to the sensitivity of health care data, there are significant concerns related to confidentiality.[25][26] Moreover, this information is centralized, and as such, it is highly vulnerable to attacks.[25] For these reasons, enabling privacy and security is very important, as illustrated by a frequency of mention in 50 percent of the literature reviewed.

Data standardization issues: Although the EHRs share data within the same organization, intra-organizational, EHR platforms are fragmented, at best. Data is stored in formats that are not compatible with all applications and technologies.[13][22] This lack of data standardization also causes problems in transfer of that data.[5][25] It complicates data acquisition and cleansing.[5][25]<ref name="ManciniExploit14"> About 39 percent of the literature mentioned this challenge.

References

  1. "PRISMA". The Ottawa Hospital. http://www.prisma-statement.org/. Retrieved 30 July 2015. 
  2. 2.0 2.1 McAfee, A.; Brynjolfsson, E. (2012). "Big data: The management revolution". Harvard Business Review 90 (10): 60–6. PMID 23074865. 
  3. 3.0 3.1 Heudecker, N. (31 July 2013). "Hype Cycle for Big Data, 2013". Gartner, Inc. https://www.gartner.com/doc/2574616/hype-cycle-big-data-. Retrieved 08 November 2016. 
  4. 4.0 4.1 4.2 Kayyali, B.; Knott, D.; Van Kuiken, S. (April 2013). "The big-data revolution in US health care: Accelerating value and innovation" (PDF). McKinsey & Company. https://digitalstrategy.nl/wp-content/uploads/E2-2013.04-The-big-data-revolution-in-US-health-care-Accelerating-value-and-innovation.pdf. Retrieved 11 November 2016. 
  5. 5.0 5.1 5.2 5.3 Chawla, N.V.; Davis, D.A. (2013). "Bringing big data to personalized healthcare: A patient-centered framework". Journal of General Internal Medicine 28 (Suppl. 3): S660-5. doi:10.1007/s11606-013-2455-8. PMC PMC3744281. PMID 23797912. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3744281. 
  6. "Summary dbGaP Statistics". Genomic Data Sharing. National Institutes of Health. 23 June 2014. https://gds.nih.gov/17summary_dbGaP_statistics.html. Retrieved 09 November 2016. 
  7. Feldman, B.; Martin, E.M.; Skotnes, T. (October 2012). "Big Data in Healthcare: Hype and Hope" (PDF). Dr. Bonnie 360º. https://www.ghdonline.org/uploads/big-data-in-healthcare_B_Kaplan_2012.pdf. Retrieved 09 November 2016. 
  8. Hillestad, R.; Bigelow, J.; Bower, A. et al. (2005). Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. 24. pp. 1103–17. doi:10.1377/hlthaff.24.5.1103. PMID 16162551. 
  9. Centers for Medicare & Medicaid Services (28 July 2010). "Medicare and Medicaid Programs; Electronic Health Record Incentive Program". Federal Register. https://www.federalregister.gov/documents/2010/07/28/2010-17207/medicare-and-medicaid-programs-electronic-health-record-incentive-program. Retrieved 09 November 2016. 
  10. "State Health Information Exchange Cooperative Agreement Program". State Health Information Exchange. U.S. Department of Health and Human Services. 14 March 2014. https://www.healthit.gov/policy-researchers-implementers/state-health-information-exchange. Retrieved 09 November 2016. 
  11. Murdoch, T.B.; Detsky, A.S. (2013). "The inevitable application of big data to health care". JAMA 309 (13): 1351–2. doi:10.1001/jama.2013.393. PMID 23549579. 
  12. 12.0 12.1 12.2 12.3 12.4 Jee, K.; Kim, G.H. (2013). Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. 19. pp. 79–85. doi:10.4258/hir.2013.19.2.79. PMC PMC3717441. PMID 23882412. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3717441. 
  13. 13.0 13.1 13.2 13.3 13.4 13.5 Raghupathi, W.; Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. 2. pp. 3. doi:10.1186/2047-2501-2-3. PMC PMC4341817. PMID 25825667. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4341817. 
  14. Song, T.M.; Song, J.; An, J.Y. et al. (2014). "Psychological and social factors affecting Internet searches on suicide in Korea: A big data analysis of Google search trends". Yonsei Medical Journal 55 (1): 254-63. doi:10.3349/ymj.2014.55.1.254. PMC PMC3874928. PMID 24339315. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3874928. 
  15. 15.0 15.1 15.2 Fernandes, L.; O'Connor, M.; Weaver, V. (2012). "Big data, bigger outcomes: Healthcare is embracing the big data movement, hoping to revolutionize HIM by distilling vast collection of data for specific analysis". Journal of AHIMA 83 (10): 38–43. PMID 23061351. 
  16. 16.0 16.1 Kim, T.; Park, K.; Yi, S. et al. (2014). "A big data framework for u-Healthcare systems utilizing vital signs". 2014 International Symposium on Computer, Consumer and Control (IS3C) 2014. doi:10.1109/IS3C.2014.135. 
  17. Augustine, P.D. (2014). "Leveraging big data analytics and Hadoop in developing India's healthcare service". International Journal of Computer Applications 89 (16): 44-50. doi:10.5120/15719-4622. 
  18. 18.0 18.1 Jiang, P.; Winkley, J.; Zhao, C. et al. (2016). "An intelligent information forwarder for healthcare big data systems with distributed wearable sensors". IEEE Systems Journal 10 (3): 1147-1159. doi:10.1109/JSYST.2014.2308324. 
  19. Hrovat, G.; Stiglic, G.; Kokol, P. et al. (2014). "Contrasting temporal trend discovery for large healthcare databases". Computer Methods and Programs in Biomedicine 113 (1): 251-7. doi:10.1016/j.cmpb.2013.09.005. PMID 24120407. 
  20. 20.0 20.1 Baro, E.; Degou, S.; Beuscart, R. et al. (2015). "Toward a literature-driven definition of big data in healthcare". BioMed Research International 2015: 639021. doi:10.1155/2015/639021. PMC PMC4468280. PMID 26137488. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468280. 
  21. Naqishbandi, T.; Imthyaz Sheriff, C. Qazi, S. (2015). "Big data, CEP and IoT: Redefining holistic healthcare information systems and analytics". International Journal of Engineering Research & Technology 4 (1): 1–6. https://www.ijert.org/view-pdf/12266/big-data-cep-and-iot--redefining-holistic-healthcare-information-systems-and-analytics. 
  22. 22.0 22.1 Hsieh, J.C.; Li, A.H.; Yang, C.C. (2013). "Mobile, cloud, and big data computing: contributions, challenges, and new directions in telecardiology". International Journal of Environmental Research and Public Health 10 (11): 6131-53. doi:10.3390/ijerph10116131. PMC PMC3863891. PMID 24232290. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3863891. 
  23. Sepulveda, M.J. (2013). "From worker health to citizen health: moving upstream". Journal of Occupational and Environmental Medicine 55 (12 Suppl.): S52-7. doi:10.1097/JOM.0000000000000033. PMC PMC4171364. PMID 24284749. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4171364. 
  24. Baker, T.B.; Gustafson, D.H.; Shah, D. (2014). "How can research keep up with eHealth? Ten strategies for increasing the timeliness and usefulness of eHealth research". Journal of Medical Internet Research 16 (2): e36. doi:10.2196/jmir.2925. PMC PMC3961695. PMID 24554442. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3961695. 
  25. 25.0 25.1 25.2 25.3 Mohr, D.C.; Burns, M.N.; Schueller, S.M. et al. (2013). "Behavioral intervention technologies: evidence review and recommendations for future research in mental health". General Hospital Psychiatry 35 (4): 332-8. doi:10.1016/j.genhosppsych.2013.03.008. PMC PMC3719158. PMID 23664503. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3719158. 
  26. Mancini, M. (2014). "Exploiting big data for improving healthcare services". Journal of e-Learning and Knowledge Society 10 (2): 1–11. doi:10.20368/je-lks.v10i2.929. http://je-lks.org/ojs/index.php/Je-LKS_EN/article/view/928. 

Abbreviations

ARRA: American Recover and Reinvestment Act

EHR: electronic health record

HIE: Health Information Exchange

HIPAA: Health Insurance Portability and Accountability Act

HITECH: Health Information Technology for Economic and Clinical Health

MeSH: Medical Subject Headings

PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-analysis

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In several cases the PubMed ID was missing and was added to make the reference more useful.

Per the distribution agreement, the following copyright information is also being added:

©Clemens Scott Kruse, Rishi Goswamy, Yesha Raval, Sarah Marawi. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.11.2016.