Journal:Challenges and opportunities of big data in health care: A systematic review

From LIMSWiki
Jump to: navigation, search
Full article title Challenges and opportunities of big data in health care: A systematic review
Journal JMIR Medical Informatics
Author(s) Kruse, Clemens S.; Goswamy, Rishi; Raval, Yesha; Marawi, Sarah
Author affiliation(s) School of Health Administration, Texas State University
Primary contact Email: scottkruse [at] txstate.edu; Phone: 1 2103554742
Editors Eysenbach, G.
Year published 2016
Volume and issue 4 (4)
Page(s) e38
DOI 10.2196/medinform.5359
ISSN 2291-9694
Distribution license Creative Commons Attribution 2.0
Website http://medinform.jmir.org/2016/4/e38/
Download http://medinform.jmir.org/2016/4/e38/pdf (PDF)

Abstract

Background: Big data analytics offers promise in many business sectors, and health care is looking at big data to provide answers to many age-related issues, particularly dementia and chronic disease management.

Objective: The purpose of this review was to summarize the challenges faced by big data analytics and the opportunities that big data opens in health care.

Methods: A total of three searches were performed for publications between January 1, 2010 and January 1, 2016 (PubMed/MEDLINE, CINAHL, and Google Scholar), and an assessment was made on content germane to big data in health care. From the results of the searches in research databases and Google Scholar (N=28), the authors summarized content and identified nine and 14 themes under the categories "Challenges" and "Opportunities," respectively. We rank-ordered and analyzed the themes based on the frequency of occurrence.

Results: The top challenges were issues of data structure, security, data standardization, storage and transfers, and managerial skills such as data governance. The top opportunities revealed were quality improvement, population management and health, early detection of disease, data quality, structure, and accessibility, improved decision making, and cost reduction.

Conclusions: Big data analytics has the potential for positive impact and global implications; however, it must overcome some legitimate obstacles.

Keywords: big data, analytics, health care, human genome, electronic medical record

Introduction

Rationale

Big data analytics offers promise in many business sectors, and health care is looking at big data to provide answers to many age-related issues, particularly dementia and chronic disease management. This systematic review explores the depth of big data analytics since 2010 and identifies both challenges and opportunities associated with big data in health care. The review follows the standard set by Preferred Reporting Items for Systematic Reviews and Meta-analysis (2009).[1]

Big data is commonly defined through the four Vs: volume (scale or quantity of data), velocity (speed and analysis of real-time or near-real-time data), variety (different forms of data, often from disparate data sources), and veracity (quality assurance of the data). The first three Vs are found in most literature[2][3], and the fourth V is a goal.[4]

As of 2012, about 2.5 exabytes of data are created each day; Walmart can collect up to 2.5 petabytes of customer-related data per hour.[2] The industry of health care produces and collects data at a staggering speed, but different electronic health records (EHRs) collect data in different structures: structured, unstructured, and semistructured. This variety can pose difficulty when seeking veracity or quality assurance of the data. The EHRs can provide a rich source of data, ripe for analysis to increase our understanding of disease mechanisms, as well as better and personalized health care, but the data structures pose a problem to standard means of analysis.[5]

There are several large sources for big data in health care: genomics, EHR, medical monitoring devices, wearable video devices, and health-related mobile phone apps. Approximately 483 studies on genomics are registered with the U.S. Department of Health and Human Services; these studies are being conducted in nine countries, and they all use portions of the data from the Human Genome Project.[6] The EHR, being adopted in many countries, offers a source of data the depth of which is almost inconceivable. About 500 petabytes of data was generated by the EHR in 2012, and by 2020, the data will reach 25,000 petabytes.[7] The EHR can collect data from other monitoring devices, but the continuous data streams are not consistently saved in the longitudinal record.

The decrease in the cost of storage has enabled an exponential distribution of data collection, but the ability to analyze this quantity of data is the center of gravity for “big data” in health care. In the United States, financial incentives offered for the “meaningful use” of health information technology has spurred growth in the adoption of the EHR and other enabling health-related technology since 2009.

Health information systems show great potential in improving the efficiency in the delivery of care, a reduction in overall costs to the health care system, as well as a marked improvement in patient outcomes.[8] The U.S. government has allocated billions of dollars to help the country’s health care market realize some of these efficiencies and savings. Specific provisions of the Health Information Technology for Economic and Clinical Health (HITECH) Act, part of the American Recovery and Reinvestment Act, acknowledge the importance of information technology (IT) in the delivery of health care within the United States.[9] The Act allocates approximately US $17.2 billion in incentives for the adoption and meaningful use of health information technology, part of which involves the participation in the electronic exchange of clinical information. In 2010, the Congress passed the Health Information Exchange (HIE) Challenge Grant Program, which contributed about US $547.7 million to state HIE programs.[10]

With the implementation of this legislation as well as the technologies associated with it, it is imperative to effectively organize and process the ever-increasing quantity of data that is digitally collected and stored within health care organizations. Other industries such as astronomy, retail, search engines, and politics have developed advanced data-handling capabilities to convert data into knowledge. Health care needs to follow their lead so that decisions regarding organizational objectives and goals can be met.[4][11][12] This evolutionary process of data management is collectively known as big data, and it is essential to the future of adoption and management of health information technology.[13]

Objectives

The purpose of this systematic review is to objectively review articles and studies published in academic journals in order to compile a list of challenges and opportunities faced by big data analytics in health care in the United States. Particular emphasis will be paid to age-related applications of big data.

Methods

Eligibility criteria

Articles and studies were eligible for analysis if they were published between 2010 and 2015, published in academic journals, and published in English. The researchers chose a range from 2010 to 2015 for two reasons: HITECH was passed in 2009, and it appeared that a blossom of research and other articles seemed to occur in 2010. We focused on academic journals for their peer-review quality and to decrease the chance of selecting something about big data published from a non-credible source.

Information sources

A combination of key terms from Medical Subject Headings (MeSH) and Boolean operators were combined and used in two common research databases, CINAHL and PubMed, and combined with a general search from Google Scholar (see Figure 1) in January 2016.

These terms were chosen not only because they are the focus of the review, but also because they were identified in the initial research into the definition of big data.


Fig1 Kruse JMIRMedInfo2016 4-4.png

Fig. 1 Literature review process with inclusion and exclusion criteria

Search

The following search string was used in all three searches: ((“big data” AND healthcare) OR (“big data” AND “health care”)). This search string was used in CINAHL, PubMed (MEDLINE), and Google Scholar. In the two research databases, our team was able to restrict the search to academic journals (including other systematic reviews). MEDLINE was excluded in CINAHL because it was already captured in PubMed. Google Scholar creates difficulty for searches because of its severe limit of filters typically associated with academic research. The initial 13,935 results were limited by restricting dates to the last five years, limiting results to academic journals and MEDLINE, and in Google Scholar by restricting the keyword search to titles. The result from the filters ended with 121 articles to review.

Study selection

Through group research and a series of consensus meetings, researchers were trained to identify articles germane to this review and to recommend elimination of all others. A shared spreadsheet was used by the research team to parse through the list of articles. Researchers read all articles in their entirety. A total of 97 articles were eliminated due to various exclusion criteria (not germane to big data or health care, editorial only, not an academic journal, or duplicate from another search), and four additional articles were identified from the references of the 24 that remained. The group of reviewers made these rejections or additional recommendations through a series of consensus meetings where we met to discuss their recommendations and consensus was reached through discussion. A total of 28 articles remained in the final review.

Data collection process and identification of summary measures

Each article was reviewed by at least two authors to identify the relevant points. All reviewers used a spreadsheet template to summarize their key observations from each article. One team member combined the spreadsheets into one and shared it once again. Reviewers held one more consensus meeting to discuss their findings. From this meeting, trends were identified, and from those trends, inferences were made.

Additional analysis

From the list of observations, reviewers were able to identify some common threads that emerged as challenges and opportunities in health care that permeated multiple articles. Separate tables were created to group the threads, and from each of these tables, common themes were identified. These common themes only emerged when reviewers combined their observations. These themes were tabulated and counted for additional analysis.

Results

Study selection

As depicted in Figure 1, 935 articles resulted from the initial search. Filters such as "data published" (2010-2015), "academic journals," and "English language" were implemented to reduce the range to what was being studied. Reviewers agreed to eliminate editorials and focus on those articles that studied big data, as described in the Introduction section of this manuscript. At the end of the search process, only 28 remained. The articles reviewed for this study ranged from 2012 to 2015. The majority of the literature chosen for this paper was published in 2014 (15/28, 54%), and a minority was published in 2015 (2/28, 7%); the latter was most likely due to the early part of the year when the search was conducted.

Synthesis of results

Multiple reviewers read each article in its entirety. Articles were included or excluded based on the criteria illustrated in Figure 1. All articles included in the analysis were sorted by date and are listed in Multimedia Appendix 1.

A study catalog number was assigned to each article to simplify the analysis. Researchers summarized the main points of each article for further analysis.

Additional analysis

Through the combination of observations, reviewers identified common threads (challenges and opportunities) and themes from each thread. Themes were organized into affinity diagrams (Tables 1 and 2), compared, and discussed among researchers.

Challenges for big data in health care

Nine themes emerged under the category of challenges: data structure, security, data standardization, data storage and transfers, managerial issues such as governance and ownership, lack of skill of data analysts, inaccuracies in data, regulatory compliance, and real-time analytics. Examples for each theme are provided in Table 1. A total of 60 observations were made for challenges.

Table 1. Themes associated with challenges for big data in health care
Themes Examples Number of articles (n) Articles themes appeared in % of total articles (N=28)
Data structure Fragmented data, incompatible formats, heterogeneous data, raw and unstructured datasets, large volume, high variety and velocity, lack of transparency 17 1, 2, 7-9, 12, 14-19, 22, 25-28 61%
Security Privacy, confidentiality, data duplication, integrity 14 2, 4, 7-9, 12, 13, 17, 21, 22, 25-28 50%
Data standardization Limited interoperability, data acquisition and cleansing, global sharing, terminology, language barriers 11 4, 5, 7-9, 11, 12, 15, 16, 22, 25 39%
Storage and transfers Expensive to store; transfer from one place to other; store electronic data; securely extract, transmit, and process 8 1, 4, 7, 12, 22, 26, 28 28%
Managerial issues Governance issues, ownership issues 4 2, 8, 14, 22 14%
Lack of skill Untrained workers 3 5, 9, 14 11%
Inaccuracies Inconsistencies, lack of precision, data timeliness 1 9 4%
Regulatory compliance Legal concerns 1 13 4%
Real-time analytics Real-time analytics 1 9 4%

The four Vs appear in multiple places under the "Challenges" category. Volume and variety are seen by name under the theme of "Data structure." Variety is also implied in the same theme, but listed as "Incompatible formats," as well as "Raw and unstructured datasets." Variety can also be inferred from the theme of "Data standardization," listed as "Limited interoperability." Velocity is seen in the theme "Real-time analytics." Veracity is seen under the theme of "Data standardization" but listed as "Data acquisition and cleansing," "Terminology," and "Language barriers." It is also inferred in the theme "Inaccuracies" listed as "Inconsistencies" and "Lack of precision."

Data structure issues:

Issues related to data structure were addressed in the majority of the papers reviewed for this study. It is essential that the key functions of data processing are supported by the applications of big data.[13] Big data applications should be user-friendly, transparent, and menu-driven.[13][14] The majority of data in health care is unstructured, such as from natural language processing.[12] It is often fragmented, dispersed, and rarely standardized. [12][13][15][16][17][18][19][20][21] It is no secret that the EHRs do not share well across organizational lines, but with unstructured data, even within the same organization, unstructured data is difficult to aggregate and analyze. It is no wonder that 61 percent of the articles analyzed listed this as a concern; big data analytics will need to address this significant challenge.

Research data within the health care sector is more heterogeneous than the research data produced within other research fields.[3][5][12] Data from both research and public health is often produced in large volumes.[15][22][23] Another structure-related issue results from the changing health care fee-for-service care model.[4] Finally, big data will need to address issues with the transparency of metadata.[16][24]

Security issues:

There are considerable privacy concerns regarding the use of big data analytics, specifically in health care given the enactment of Health Insurance Portability and Accountability Act (HIPAA) legislation.[15] Data that is made available on open source is freely available and, hence, highly vulnerable.[12][13][18][20] Further, due to the sensitivity of health care data, there are significant concerns related to confidentiality.[25][26] Moreover, this information is centralized, and as such, it is highly vulnerable to attacks.[25] For these reasons, enabling privacy and security is very important, as illustrated by a frequency of mention in 50 percent of the literature reviewed.

Data standardization issues:

Although the EHRs share data within the same organization, intra-organizational, EHR platforms are fragmented, at best. Data is stored in formats that are not compatible with all applications and technologies.[13][22] This lack of data standardization also causes problems in transfer of that data.[5][25] It complicates data acquisition and cleansing.[5][25][26] About 39 percent of the literature mentioned this challenge.

Limited interoperability poses a large challenge for big data, as data is rarely standardized.[12][13][16][22] This leaves big data to face issues related to the acquisition and cleansing of data into a standardized format to enable analysis and global sharing.[13][17][23][25][27] With globalization of data, big data will have to deal with a variety of standards, barriers of language, and different terminologies.

Storage and transfers:

Data generation is inexpensive compared with the storage and transfer of the same. Once data is generated, the costs associated with securing and storing them remain high.[25] Costs are also incurred with transferring data from one place to another as well as analyzing it.[14][21][22] Some researchers have been able to combine the themes of "Data structure" and "Storage and transfers" when they illustrate how structured data can be easily stored, queried, analyzed, and so forth, but unstructured data is not as easily manipulated.[13] Cloud-based health information technology has the additional layer of security associated with the extraction, transformation, and loading of patient-related data.[27] The use of big data should address issues related to increased expenditures as well as the transmittance of secure or insecure information. About 28 percent of the literature mentioned this challenge.

Managerial issues:

Data governance will need to move up on the priority list of organizations, and it should be treated as a primary asset instead of a by-product of the business.[15] Data ownership and data stewardship should create new roles in business that consider big data analytics[15], and new partnerships will need to be brokered when sharing data.[23][24][27] About 14 percent of the literature mentioned this point.

Lack of appropriate skills:

It is important that health care workers are also kept up-to-date with the use of constantly changing technology, techniques, and a constantly moving standard of care.[5][24] Due to the constant evolution of technology, there exist populations of individuals lacking specific skills; as such, this is also a significant continuing barrier to the implementation of big data.[12] About 11 percent of the literature expressed this challenge.

Inaccuracies (veracity):

Self-reported data is extensively used in health care, and so it is crucial that the data collected in this manner be consistent.[12] Keeping information current as well as accurate is another challenge of data collection. Precision of data is also needed to provide accurate information.[12] Only 4 percent of the literature mentioned this challenge.

Regulatory compliance issues:

Health care organizations should be aware of the various legal issues that can surface in the process of managing a high volume of sensitive information. Organizations implementing big data analytics as a part of their information systems will have to comply with a significant amount of standards and regulatory compliance issues specific to health care.[28] Only 4 percent of the literature mentioned this challenge.

Real-time analytics (velocity):

One of the key requirements in health care is to be able to utilize big data in real time. Real time is defined by enabling the use of applications such as cloud computing to view said data in real time. The use of these technologies leads to issues of security and privacy within patient information.[12] Only 4 percent of the literature mentioned this challenge. Challenges most often mentioned or discussed were data structure (17/28, 61%), security (14/28, 50%), data standardization (11/28, 39%), and data storage and transfers (8/28, 29%). The other five challenges comprised less than 15 percent of the observations.

Opportunities for big data in health care

Eleven themes emerged under the category of opportunities: improve quality of care; managing population health; early detection of diseases; data quality, structure, and accessibility; improve decision making; cost reduction; patient-centric care; enhancing personalized medicine; globalization; fraud detection; and health-threat detection. Examples of each theme are listed in Table 2. A total of 113 observations were made for opportunities.

Table 2. Themes that emerged from the opportunities for big data in health care
Themes Examples Number of articles (n) Articles themes appeared in % of total articles (N=28)
Improve quality of care Improve efficiency, improve outcomes, reduce waste, reduce re-admissions, increased productivity and performance, risk reduction, process optimization 18 2, 4, 5, 6, 8-13, 18-20, 22-25, 27 64%
Managing population health Managing population health 17 2, 5, 8-10, 12-14, 16, 18-20, 23, 25, 26, 28 61%
Early detection of diseases Predicting epidemics, disease monitoring, health tracking, adopt and track healthier behaviors, predicting patient vulnerability, improved treatments 17 2, 4, 5, 7-13, 15, 18-20, 23, 24, 28 61%
Data quality, structure, and accessibility Large volumes, wide variety, creating transparency, high-velocity capture, access to primary data, reusable data, weed out unwanted data, open source/free access 16 2, 4, 6, 9, 11, 12, 16, 18, 20- 23, 25-28 57%
Improve decision making Evidence-based medicine, new treatment guidelines, accuracy in information 11 2-4, 7, 9, 12, 16, 20, 22, 23, 24 39%
Cost reduction Inexpensive, reducing health care spending 10 1, 3, 4, 7, 9, 11, 12, 14, 16, 18 36%
Patient-centric health care Empowering patients, patients making informed decisions, increased communication 8 2, 3, 5, 12, 14, 20, 22, 24 29%
Enhancing personalized medicine Targeted approach 6 4-6, 24, 25, 28 24%
Globalization Widely accessible, global sharing, leveraging knowledge and practices, knowledge dissemination 6 2, 6-8, 10, 20 24%
Fraud detection Fraud detection 3 8, 12, 28 11%
Health threat detection Health threat detection 1 7 4%

Despite the challenges that big data needs to overcome, the advanced analytics that are promised through big data offer tremendous opportunities for most stakeholders in the health care industry (patient, provider, and payer). More than 64 percent of the articles analyzed focused on quality improvement and more than 60 percent on managing population health and early detection of diseases through big data analytics. If even some of the opportunities of big data are realized, they can radically change patient outcomes and the way decisions are made by providers, and help solve some macro-level issues related to health care within countries such as the United States (cost, quality, and access).

Improve quality of care:

Big data has the potential and ability to improve the quality and efficiency of care.[5][15][23][29][30][31] Big data offers an ability to predict outcomes using the available primary or historical data and provide proof of benefit that could change established, industry-wide standards of care.[25][28] Leveraging technology on the patient end can also help with medication adherence.[25][23] This will most certainly play an important role in improving outcomes[2][13] and improve the health-related quality of life.[20][26][32]

Quality of care will also be improved by reducing waste of information, which will reduce inefficiencies.[13][26] This will also assist in analyzing real-time resource utilization productivity.[13] Quality can also be improved by reducing the rates of re-admissions, increasing operational efficiencies, and improving performance.[5][12][13] About 64 percent of the literature mentioned this opportunity.

Managing population health:

The management of population health and the early detection of diseases were topics that the authors thought would have highly similar results after the analysis. Although there was a large overlap between the two themes, there was also specific variation between them. So, the researchers chose to keep them separate. The theme of managing population health focused on special populations rather than public health.

Big data analytics define populations at a finer level of granularity than has ever been previously achieved.[5][14][15][33] It can help in managing the overall health of a population as well as specific individual health.[13][26][29] Big data can enable population health management from a local or global perspective.[31][34] This capability becomes more salient from the global perspective when considering the aging of the population and age-related health issues shared by many populations and subpopulations, many of which are underserved.[17][19][21][24][28][32] About 61 percent of the literature mentioned this opportunity.

Early detection of diseases:

Big data allows for the early detection of diseases, which aids in clinical objectives related to achieving improved treatments and higher patient outcomes.[12][13][15][22][25] It is in this area that the authors found great promise in age-related illness and disease. Along with early detection, big data analytics can also help in the prevention of a wide range of deadly illnesses and personalized disease management and monitoring.[5][19][21][22][29][34] It enables providers to track healthy behaviors and helps patients in monitoring their respective conditions.[25][32][33] This capability holds great potential when faced with either age-related diseases, or worldwide health issues such as cardiology.[16][22][28][31][34] About 61 percent of the literature mentioned this opportunity.

Data quality, structure, and accessibility:

Literature suggests that big data enables rapid capture of data and the conversion of primary, raw and unstructured data into meaningful information.[15][17][31][34] New knowledge can then be generated from high volumes of effective data, enabling reuse of the data.[15][20][21][32][33] Open-source technology increases accessibility to and transparency of the data.[12][25][26][30][35] Finally, data quality can be maintained using analytics to get rid of unnecessary information.[27] About 57 percent of the literature mentioned this opportunity.

Improving decision making:

Big data enables appropriate use of evidence-based medicine and helps health care providers make more informed decisions.[12][13][15][22] This, in turn, improves the quality of care provided to the patients.[16][31][36] Remote monitoring, patient profile analytics, and genomic analytics are examples of other applications that influence the decision-making process.[13][25]

Decision-making process can be highly optimized by the availability of accurate and up-to-date information, as decision making is influenced by the generation of new practices and treatment guidelines within clinical research. Allowing big data to influence decision making will allow for a faster and simpler process. This is done by either supporting or replacing human decision making. About 39 percent of the literature mentioned this opportunity.

Cost reduction:

The literature suggests that the decrease in cost of the elements of computing, such as storage and processing, leads to a decrease in the cost of data-intensive tasks.[2][13] This pass-through of savings will be seen across the spectrum of medicine[24][36] and the health care workforce.[25] Savings will be realized through more cost-effective treatments and monitoring to improve medication adherence[25][31] and through the reduction of costly transportation costs, as is experienced in cardiology.[12][17][22][34] About 36 percent of the literature mentioned this opportunity.

Patient-centric care:

Increasing the use of technology is slowly changing the direction of the health care sector from disease-centric care toward patient-centric care.[5] Big data will play a significant role in this transformation.[37] It will allow the information to be delivered to patients directly and empower them to play an active part in their care.[5][15][27] When patients are provided with the appropriate information, it will influence their decision making and allow them to make informed decisions.[13][24] Informed decisions will also be influenced by increased communication between patients, providers, as well as their communities.[5][24][32][36] About 29 percent of the literature mentioned this opportunity.

Enhancing personalized medicine:

With the use of big data, the objectives of personalized medicine can be translated into clinical practice.[5][25][30] Access to and processing of large volumes of data should enable a personalized patient-specific record of risks of disease.[25][29][32] Big data applications aim to make this process more efficient.[12] About 24 percent of the literature mentioned this opportunity.

Globalization:

Big data will actively help in disseminating the knowledge acquired from the data collected [15,22,30].[15][22][30] Big data plays an active role in leveraging the practices and knowledge not only regionally but globally.[12][15][29] By globalizing data, it is made more widely accessible and providers may access new information from all regions.[22][23][32] About 24 percent of the literature mentioned this opportunity.

Fraud detection:

One of the most significant benefits offered by big data is that it is instrumental in detecting fraud in an efficient and effective manner.[13][23] For example, the unauthorized use of specific user accounts by third parties can be minimized.[21] Only about 11 percent of the literature mentioned this opportunity.

Health threat detection:

Big data offers opportunity for improving capabilities of threat detection quickly and more accurately. This can be especially beneficial for government use.[22] Big data augments the current acquisition of protection against the increasing threats of foreign countries, criminals, terrorists, and others. Only 3.6 percent of the literature mentioned this opportunity.

Opportunities most often mentioned or discussed were improve quality of care (18/28, 64%), managing population health (17/28, 61%), early detection of diseases (17/28, 60.7%), data quality structure and accessibility (16/28, 57%), improve decision making (11/28, 39.3%), cost reductions (10/28, 36%), patient-centric health care (8/28, 29%), enhancing personalized medicine (6/28, 24%), and globalization (6/28, 24%). The other two opportunities each comprised less than 15 percent of the observations.

Discussion

Summary of evidence

Although the integration of big data is well underway in industries such as finance and advertising, it has not yet fully assimilated into health care. Challenges and opportunities were made quite clear in the articles analyzed in this review. Three of the four Vs (volume, velocity, and variety) were consistently adhered to. The fourth V, veracity, was found, but rarely listed by name. Tables 1 and 2 provide insightful information that is previously unpublished. These tables identify challenges and opportunities and illustrate their frequency of mention in the literature. This information is helpful to other researchers and innovators because it provides direction and proper emphasis of research effort. The listed challenges and opportunities are ordered by their frequency found in the literature.

Limitations

A big limitation in this review is the low number of articles used in the analysis. If we were to do this over again, we would query another database to see whether additional articles were available for analysis.

Selection bias seems to exist in any study. Our control for selection bias was the initial research up front to agree on a definitive definition of the concept of big data, and our consensus meetings to discuss findings. The consensus meetings offered great value to the process because they enabled the group to hear the focus of an individual and either provide feedback to confirm the focus or agree that the unique focus was warranted for all the articles in the review.

Another bias that we discuss regularly is publication bias. Journals tend to publish results that are statistically significant, which inherently limits the publication of research that may not reach that level. Our control for publication bias was to include Google Scholar in our search. Our intent was to identify material in lesser-known journals that might not be indexed in PubMed (MEDLINE) or CINAHL.

Conclusions

Big data and the use of advanced analytics have the potential to advance the way in which providers leverage technology to make informed clinical decisions. However, the vast amounts of information generated annually within health care must be organized and compartmentalized to enable universal accessibility and transparency between health care organizations.

Our systematic literature review revealed both challenges and opportunities that big data offers to the health care industry. The literature mentioned the challenges of data structure and security in at least 50 percent of the articles reviewed. The literature also mentioned the opportunities of increased quality, better management of population health, early detection of disease, and data quality structure and accessibility in at least 50 percent of the articles reviewed. These findings identify foci for future research.

Conflict of interest

None declared.

Multimedia appendix 1

Summary or relevance of cited work: PDF file, 33kb

References

  1. "PRISMA". The Ottawa Hospital. http://www.prisma-statement.org/. Retrieved 30 July 2015. 
  2. 2.0 2.1 2.2 2.3 McAfee, A.; Brynjolfsson, E. (2012). "Big data: The management revolution". Harvard Business Review 90 (10): 60–6. PMID 23074865. 
  3. 3.0 3.1 Heudecker, N. (31 July 2013). "Hype Cycle for Big Data, 2013". Gartner, Inc. https://www.gartner.com/doc/2574616/hype-cycle-big-data-. Retrieved 08 November 2016. 
  4. 4.0 4.1 4.2 Kayyali, B.; Knott, D.; Van Kuiken, S. (April 2013). "The big-data revolution in US health care: Accelerating value and innovation" (PDF). McKinsey & Company. https://digitalstrategy.nl/wp-content/uploads/E2-2013.04-The-big-data-revolution-in-US-health-care-Accelerating-value-and-innovation.pdf. Retrieved 11 November 2016. 
  5. 5.00 5.01 5.02 5.03 5.04 5.05 5.06 5.07 5.08 5.09 5.10 5.11 5.12 Chawla, N.V.; Davis, D.A. (2013). "Bringing big data to personalized healthcare: A patient-centered framework". Journal of General Internal Medicine 28 (Suppl. 3): S660-5. doi:10.1007/s11606-013-2455-8. PMC PMC3744281. PMID 23797912. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3744281. 
  6. "Summary dbGaP Statistics". Genomic Data Sharing. National Institutes of Health. 23 June 2014. https://gds.nih.gov/17summary_dbGaP_statistics.html. Retrieved 09 November 2016. 
  7. Feldman, B.; Martin, E.M.; Skotnes, T. (October 2012). "Big Data in Healthcare: Hype and Hope" (PDF). Dr. Bonnie 360º. https://www.ghdonline.org/uploads/big-data-in-healthcare_B_Kaplan_2012.pdf. Retrieved 09 November 2016. 
  8. Hillestad, R.; Bigelow, J.; Bower, A. et al. (2005). Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. 24. pp. 1103–17. doi:10.1377/hlthaff.24.5.1103. PMID 16162551. 
  9. Centers for Medicare & Medicaid Services (28 July 2010). "Medicare and Medicaid Programs; Electronic Health Record Incentive Program". Federal Register. https://www.federalregister.gov/documents/2010/07/28/2010-17207/medicare-and-medicaid-programs-electronic-health-record-incentive-program. Retrieved 09 November 2016. 
  10. "State Health Information Exchange Cooperative Agreement Program". State Health Information Exchange. U.S. Department of Health and Human Services. 14 March 2014. https://www.healthit.gov/policy-researchers-implementers/state-health-information-exchange. Retrieved 09 November 2016. 
  11. Murdoch, T.B.; Detsky, A.S. (2013). "The inevitable application of big data to health care". JAMA 309 (13): 1351–2. doi:10.1001/jama.2013.393. PMID 23549579. 
  12. 12.00 12.01 12.02 12.03 12.04 12.05 12.06 12.07 12.08 12.09 12.10 12.11 12.12 12.13 12.14 12.15 12.16 Jee, K.; Kim, G.H. (2013). Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. 19. pp. 79–85. doi:10.4258/hir.2013.19.2.79. PMC PMC3717441. PMID 23882412. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3717441. 
  13. 13.00 13.01 13.02 13.03 13.04 13.05 13.06 13.07 13.08 13.09 13.10 13.11 13.12 13.13 13.14 13.15 13.16 13.17 13.18 13.19 Raghupathi, W.; Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. 2. pp. 3. doi:10.1186/2047-2501-2-3. PMC PMC4341817. PMID 25825667. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4341817. 
  14. 14.0 14.1 14.2 Song, T.M.; Song, J.; An, J.Y. et al. (2014). "Psychological and social factors affecting Internet searches on suicide in Korea: A big data analysis of Google search trends". Yonsei Medical Journal 55 (1): 254-63. doi:10.3349/ymj.2014.55.1.254. PMC PMC3874928. PMID 24339315. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3874928. 
  15. 15.00 15.01 15.02 15.03 15.04 15.05 15.06 15.07 15.08 15.09 15.10 15.11 15.12 15.13 Fernandes, L.; O'Connor, M.; Weaver, V. (2012). "Big data, bigger outcomes: Healthcare is embracing the big data movement, hoping to revolutionize HIM by distilling vast collection of data for specific analysis". Journal of AHIMA 83 (10): 38–43. PMID 23061351. 
  16. 16.0 16.1 16.2 16.3 16.4 Kim, T.; Park, K.; Yi, S. et al. (2014). "A big data framework for u-Healthcare systems utilizing vital signs". 2014 International Symposium on Computer, Consumer and Control (IS3C) 2014. doi:10.1109/IS3C.2014.135. 
  17. 17.0 17.1 17.2 17.3 17.4 Augustine, P.D. (2014). "Leveraging big data analytics and Hadoop in developing India's healthcare service". International Journal of Computer Applications 89 (16): 44-50. doi:10.5120/15719-4622. 
  18. 18.0 18.1 Jiang, P.; Winkley, J.; Zhao, C. et al. (2016). "An intelligent information forwarder for healthcare big data systems with distributed wearable sensors". IEEE Systems Journal 10 (3): 1147-1159. doi:10.1109/JSYST.2014.2308324. 
  19. 19.0 19.1 19.2 Hrovat, G.; Stiglic, G.; Kokol, P. et al. (2014). "Contrasting temporal trend discovery for large healthcare databases". Computer Methods and Programs in Biomedicine 113 (1): 251-7. doi:10.1016/j.cmpb.2013.09.005. PMID 24120407. 
  20. 20.0 20.1 20.2 20.3 Baro, E.; Degou, S.; Beuscart, R. et al. (2015). "Toward a literature-driven definition of big data in healthcare". BioMed Research International 2015: 639021. doi:10.1155/2015/639021. PMC PMC4468280. PMID 26137488. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4468280. 
  21. 21.0 21.1 21.2 21.3 21.4 21.5 Naqishbandi, T.; Imthyaz Sheriff, C. Qazi, S. (2015). "Big data, CEP and IoT: Redefining holistic healthcare information systems and analytics". International Journal of Engineering Research & Technology 4 (1): 1–6. https://www.ijert.org/view-pdf/12266/big-data-cep-and-iot--redefining-holistic-healthcare-information-systems-and-analytics. 
  22. 22.00 22.01 22.02 22.03 22.04 22.05 22.06 22.07 22.08 22.09 22.10 22.11 Hsieh, J.C.; Li, A.H.; Yang, C.C. (2013). "Mobile, cloud, and big data computing: contributions, challenges, and new directions in telecardiology". International Journal of Environmental Research and Public Health 10 (11): 6131-53. doi:10.3390/ijerph10116131. PMC PMC3863891. PMID 24232290. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3863891. 
  23. 23.0 23.1 23.2 23.3 23.4 23.5 23.6 Sepulveda, M.J. (2013). "From worker health to citizen health: moving upstream". Journal of Occupational and Environmental Medicine 55 (12 Suppl.): S52-7. doi:10.1097/JOM.0000000000000033. PMC PMC4171364. PMID 24284749. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4171364. 
  24. 24.0 24.1 24.2 24.3 24.4 24.5 24.6 Baker, T.B.; Gustafson, D.H.; Shah, D. (2014). "How can research keep up with eHealth? Ten strategies for increasing the timeliness and usefulness of eHealth research". Journal of Medical Internet Research 16 (2): e36. doi:10.2196/jmir.2925. PMC PMC3961695. PMID 24554442. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3961695. 
  25. 25.00 25.01 25.02 25.03 25.04 25.05 25.06 25.07 25.08 25.09 25.10 25.11 25.12 25.13 25.14 25.15 Mohr, D.C.; Burns, M.N.; Schueller, S.M. et al. (2013). "Behavioral intervention technologies: evidence review and recommendations for future research in mental health". General Hospital Psychiatry 35 (4): 332-8. doi:10.1016/j.genhosppsych.2013.03.008. PMC PMC3719158. PMID 23664503. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3719158. 
  26. 26.0 26.1 26.2 26.3 26.4 26.5 Mancini, M. (2014). "Exploiting big data for improving healthcare services". Journal of e-Learning and Knowledge Society 10 (2): 1–11. doi:10.20368/je-lks.v10i2.929. http://je-lks.org/ojs/index.php/Je-LKS_EN/article/view/928. 
  27. 27.0 27.1 27.2 27.3 27.4 Youssef, A.E. (2014). "A framework for secure healthcare systems based on big data analytics in mobile cloud computing environments". International Journal of Ambient Systems and Applications 2 (2): 1–11. http://airccse.org/journal/ijasa/current2014.html. 
  28. 28.0 28.1 28.2 28.3 Schilsky, R.L.; Michels, D.L.; Kearbey, A.H. et al. (2014). "Building a rapid learning health care system for oncology: The regulatory framework of CancerLinQ". Journal of Clinical Oncology 32 (22): 2373–9. doi:10.1200/JCO.2014.56.2124. PMID 24912897. 
  29. 29.0 29.1 29.2 29.3 29.4 Moore, P.; Thomas, A.; Tadros, G. et al. (2013). "Detection of the onset of agitation in patients with dementia: Real-time monitoring and the application of big-data solutions". International Journal of Space-Based and Situated Computing 3 (3): 136-154. doi:10.1504/IJSSC.2013.056405. 
  30. 30.0 30.1 30.2 30.3 Wang, P.; Chen, Z. (2013). "Traditional Chinese medicine ZHENG and Omics convergence: A systems approach to post-genomics medicine in a global world". OMICS 17 (9): 451-9. doi:10.1089/omi.2012.0057. PMID 23837436. 
  31. 31.0 31.1 31.2 31.3 31.4 31.5 Lamarche-Vadel, A.; Pavillon, G.; Aouba, A. et al. (2014). "Automated comparison of last hospital main diagnosis and underlying cause of death ICD10 codes, France, 2008-2009". BMC Medical Informatics and Decision Making 14: 44. doi:10.1186/1472-6947-14-44. PMC PMC4057818. PMID 24898538. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4057818. 
  32. 32.0 32.1 32.2 32.3 32.4 32.5 32.6 Howren, M.B.; Vander Weg, M.W.; Wolinsky, F.D. (2014). "Computerized cognitive training interventions to improve neuropsychological outcomes: Evidence and future directions". Journal of Comparative Effectiveness Research 3 (2): 145–54. doi:10.2217/cer.14.6. PMID 24645688. 
  33. 33.0 33.1 33.2 Wlodarczyk, T.W.; Hacker, T.J. (2014). "Current trends in predictive analytics of big data". International Journal of Big Data Intelligence 1 (3): 172-180. doi:10.1504/IJBDI.2014.066326. 
  34. 34.0 34.1 34.2 34.3 34.4 Sengupta, P.P. (2013). "Intelligent platforms for disease assessment: Novel approaches in functional echocardiography". JACC Cardiovascular Imaging 6 (11): 1206-11. doi:10.1016/j.jcmg.2013.09.003. PMID 24229773. 
  35. Issa, N.T.; Byers, S.W.; Dakshanamurthy, S. (2014). "Big data: The next frontier for innovation in therapeutics and healthcare". Expert Review of Clinical Pharmacology 7 (3): 293–98. doi:10.1586/17512433.2014.905201. PMC PMC4448933. PMID 24702684. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4448933. 
  36. 36.0 36.1 36.2 Beveridge, R.; Fox, J.; Higgins, S.A. et al. (2013). "Roundtable--The changing oncology landscape: Evolution or revolution?". Journal of the National Comprehensive Cancer Network 11 (5 Suppl.): 636-8. PMID 23704232. 
  37. Kaushik, K.; Kapoor, D.; Varadharajan, V. et al. (2014). "Disease management: Clustering-based disease prediction". International Journal of Collaborative Enterprise 4 (1–2): 69–82. doi:10.1504/IJCENT.2014.065047. 

Abbreviations

ARRA: American Recover and Reinvestment Act

EHR: electronic health record

HIE: Health Information Exchange

HIPAA: Health Insurance Portability and Accountability Act

HITECH: Health Information Technology for Economic and Clinical Health

MeSH: Medical Subject Headings

PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-analysis

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In several cases the PubMed ID was missing and was added to make the reference more useful.

Per the distribution agreement, the following copyright information is also being added:

©Clemens Scott Kruse, Rishi Goswamy, Yesha Raval, Sarah Marawi. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.11.2016.