Difference between revisions of "Journal:The challenges of data quality and data quality assessment in the big data era"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
(Added content. Saving and adding more.)
Line 28: Line 28:


==Introduction==
==Introduction==
Many significant technological changes have occurred in the information technology industry since the beginning of the 21st century, such as [[cloud computing]], the [[Book:I Dream of IoT: A Brief Introduction to the Internet of Things|Internet of Things]], and social networking. The development of these technologies has made the amount of data increase continuously and accumulate at an unprecedented speed. All the above mentioned technologies announce the coming of big data.<ref name="MengBig13">{{cite journal |title=Big Data Management: Concepts, Techniques and Challenges |journal=Journal of Computer Research and Development |author=Meng, X.; Ci, X. |volume=50 |issue=1 |pages=146–169 |year=2013 |url=http://crad.ict.ac.cn/EN/abstract/abstract715.shtml}}</ref> Currently, the amount of global data is growing exponentially. The data unit is no longer the gigabyte (GB) and terabyte (TB), but the petabyte (PB; 1 PB = 210 TB), exabyte (EB; 1 EB = 210 PB), and zettabyte (ZB; 1 ZB = 210 EB). According to IDC’s “Digital Universe” forecasts<ref name="GantzTheDig13">{{cite web |url=http://www.emc.com/collateral/analyst-reports/idc-digital-universe-western-europe.pdf |format=PDF |title=The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East — Western Europe |author=Gantz, J; Reinsel, D.; Arend, C. |publisher=IDC |date=February 2013}}</ref>, 40 ZB of data will be generated by 2020.
Many significant technological changes have occurred in the information technology industry since the beginning of the 21st century, such as [[cloud computing]], the [[Book:I Dream of IoT: A Brief Introduction to the Internet of Things|Internet of Things]], and social networking. The development of these technologies has made the amount of data increase continuously and accumulate at an unprecedented speed. All the above mentioned technologies announce the coming of big data.<ref name="MengBig13">{{cite journal |title=Big Data Management: Concepts, Techniques and Challenges |journal=Journal of Computer Research and Development |author=Meng, X.; Ci, X. |volume=50 |issue=1 |pages=146–169 |year=2013 |url=http://crad.ict.ac.cn/EN/abstract/abstract715.shtml}}</ref> Currently, the amount of global data is growing exponentially. The data unit is no longer the gigabyte (GB) and terabyte (TB), but the petabyte (PB; 1 PB = 210 TB), exabyte (EB; 1 EB = 210 PB), and zettabyte (ZB; 1 ZB = 210 EB). According to IDC’s “Digital Universe” forecasts<ref name="GantzTheDig13">{{cite web |url=http://www.emc.com/collateral/analyst-reports/idc-digital-universe-western-europe.pdf |format=PDF |title=The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East — Western Europe |author=Gantz, J; Reinsel, D.; Arend, C. |publisher=IDC |date=February 2013 |accessdate=February 2013}}</ref>, 40 ZB of data will be generated by 2020.
 
The emergence of an era of big data attracts the attention of industry, academics, and government. For example, in 2012, the U.S. government invested $200 million to start the "Big Data Research and Development Initiative."<ref name="LiRes12">{{cite journal |title=Research Status and Scientific Thinking of Big Data |journal=Bulletin of Chinese Academy of Sciences |author=Li, G.J.; Chen, X.Q. |volume=27 |issue=6 |pages=648–657 |year=2012}}</ref> ''Nature'' launched a special issue on big data.<ref name="NatureBD">{{cite web |url=http://www.nature.com/news/specials/bigdata/index.html |title=Big Data |work=Nature |publisher=Macmillan Publishers Limited |date=03 September 2008 |accessdate=05 November 2013}}</ref> ''Science'' also published a special issue "Dealing with Data," which illustrated the importance of big data for scientific research.<ref name="ScienceDWD">{{cite web |url=http://www.sciencemag.org/site/special/data/ |title=Dealing with Data |work=Science |publisher=American Association for the Advancement of Science |date=11 February 2011 |accessdate=05 November 2013}}</ref> In addition, the development and utilization of big data have been spread widely in the medical field, retail, finance, manufacturing, logistics, telecommunications, and other industries and have generated great social value and industrial potential.<ref name="FengOnThe13">{{cite journal |title=On the research frontiers of business management in the context of Big Data |journal=Journal of Management Sciences in China |author=Feng, Z.Y.; Guo, X.H.; Zeng, D.J. et al. |volume=16 |issue=1 |pages=1–9 |year=2013}}</ref>
 
By rapidly acquiring and analyzing big data from various sources and with various uses, researchers and decision-makers have gradually realized that this massive amount of information has benefits for understanding customer needs, improving service quality, and predicting and preventing risks. However, the use and analysis of big data must be based on accurate and high-quality data, which is a necessary condition for generating value from big data. Therefore, we analyzed the challenges faced by big data and proposed a quality assessment framework and assessment process for it.
 
==Literature review on data quality==
In the 1950s, researchers began to study quality issues, especially for the quality of products, and a series of definitions (for example, quality is "the degree to which a set of inherent characteristics fulfill the requirements"<ref name=GAQS_QMS">{{cite web |url=http://sc.ccic.com/uploadfile/2014/0811/20140811053122619.pdf |format=PDF |title=Quality management system - Fundamentals and vocabulary (GB/T19000—2008/ISO9000:2005) |publisher=General Administration of Quality Supervision |date=29 October 2008}}</ref>; "fitness for use"<ref name="WangBeyond96">{{cite journal |title=Beyond accuracy: What data quality means to data consumers |journal=Journal of Management Information Systems |author=Wang, R.Y.; Strong, D.M. |volume=12 |issue=4 |pages=5–33 |year=1996 |doi=10.1080/07421222.1996.11518099}}</ref>; "conformance to requirements"<ref name="CrosbyQual79">{{cite book |title=Quality Is Free: The Art of Making Quality Certain |author=Crosby, P.B. |publisher=McGraw-Hill Companies |year=1979 |pages=309 |isbn=978-070145122}}</ref>) were published. Later, with the rapid development of information technology, research turned to the study of the data quality.
 


==References==
==References==

Revision as of 18:48, 8 August 2016

Full article title The challenges of data quality and data quality assessment in the big data era
Journal Data Science Journal
Author(s) Cai, Li; Zhu, Yangyong
Author affiliation(s) Fudan University and Yunnan University
Primary contact Email: lcai at fudan dot edu dot cn
Year published 2015
Volume and issue 14
Page(s) 2
DOI 10.5334/dsj-2015-002
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website http://datascience.codata.org/articles/10.5334/dsj-2015-002/
Download http://datascience.codata.org/articles/10.5334/dsj-2015-002/galley/550/download/ (PDF)

Abstract

High-quality data are the precondition for analyzing and using big data and for guaranteeing the value of the data. Currently, comprehensive analysis and research of quality standards and quality assessment methods for big data are lacking. First, this paper summarizes reviews of data quality research. Second, this paper analyzes the data characteristics of the big data environment, presents quality challenges faced by big data, and formulates a hierarchical data quality framework from the perspective of data users. This framework consists of big data quality dimensions, quality characteristics, and quality indexes. Finally, on the basis of this framework, this paper constructs a dynamic assessment process for data quality. This process has good expansibility and adaptability and can meet the needs of big data quality assessment. The research results enrich the theoretical scope of big data and lay a solid foundation for the future by establishing an assessment model and studying evaluation algorithms.

Introduction

Many significant technological changes have occurred in the information technology industry since the beginning of the 21st century, such as cloud computing, the Internet of Things, and social networking. The development of these technologies has made the amount of data increase continuously and accumulate at an unprecedented speed. All the above mentioned technologies announce the coming of big data.[1] Currently, the amount of global data is growing exponentially. The data unit is no longer the gigabyte (GB) and terabyte (TB), but the petabyte (PB; 1 PB = 210 TB), exabyte (EB; 1 EB = 210 PB), and zettabyte (ZB; 1 ZB = 210 EB). According to IDC’s “Digital Universe” forecasts[2], 40 ZB of data will be generated by 2020.

The emergence of an era of big data attracts the attention of industry, academics, and government. For example, in 2012, the U.S. government invested $200 million to start the "Big Data Research and Development Initiative."[3] Nature launched a special issue on big data.[4] Science also published a special issue "Dealing with Data," which illustrated the importance of big data for scientific research.[5] In addition, the development and utilization of big data have been spread widely in the medical field, retail, finance, manufacturing, logistics, telecommunications, and other industries and have generated great social value and industrial potential.[6]

By rapidly acquiring and analyzing big data from various sources and with various uses, researchers and decision-makers have gradually realized that this massive amount of information has benefits for understanding customer needs, improving service quality, and predicting and preventing risks. However, the use and analysis of big data must be based on accurate and high-quality data, which is a necessary condition for generating value from big data. Therefore, we analyzed the challenges faced by big data and proposed a quality assessment framework and assessment process for it.

Literature review on data quality

In the 1950s, researchers began to study quality issues, especially for the quality of products, and a series of definitions (for example, quality is "the degree to which a set of inherent characteristics fulfill the requirements"[7]; "fitness for use"[8]; "conformance to requirements"[9]) were published. Later, with the rapid development of information technology, research turned to the study of the data quality.


References

  1. Meng, X.; Ci, X. (2013). "Big Data Management: Concepts, Techniques and Challenges". Journal of Computer Research and Development 50 (1): 146–169. http://crad.ict.ac.cn/EN/abstract/abstract715.shtml. 
  2. Gantz, J; Reinsel, D.; Arend, C. (February 2013). "The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East — Western Europe" (PDF). IDC. http://www.emc.com/collateral/analyst-reports/idc-digital-universe-western-europe.pdf. Retrieved February 2013. 
  3. Li, G.J.; Chen, X.Q. (2012). "Research Status and Scientific Thinking of Big Data". Bulletin of Chinese Academy of Sciences 27 (6): 648–657. 
  4. "Big Data". Nature. Macmillan Publishers Limited. 3 September 2008. http://www.nature.com/news/specials/bigdata/index.html. Retrieved 05 November 2013. 
  5. "Dealing with Data". Science. American Association for the Advancement of Science. 11 February 2011. http://www.sciencemag.org/site/special/data/. Retrieved 05 November 2013. 
  6. Feng, Z.Y.; Guo, X.H.; Zeng, D.J. et al. (2013). "On the research frontiers of business management in the context of Big Data". Journal of Management Sciences in China 16 (1): 1–9. 
  7. "Quality management system - Fundamentals and vocabulary (GB/T19000—2008/ISO9000:2005)" (PDF). General Administration of Quality Supervision. 29 October 2008. http://sc.ccic.com/uploadfile/2014/0811/20140811053122619.pdf. 
  8. Wang, R.Y.; Strong, D.M. (1996). "Beyond accuracy: What data quality means to data consumers". Journal of Management Information Systems 12 (4): 5–33. doi:10.1080/07421222.1996.11518099. 
  9. Crosby, P.B. (1979). Quality Is Free: The Art of Making Quality Certain. McGraw-Hill Companies. pp. 309. ISBN 978-070145122. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.