Journal:The challenges of data quality and data quality assessment in the big data era
Full article title | The challenges of data quality and data quality assessment in the big data era |
---|---|
Journal | Data Science Journal |
Author(s) | Cai, Li; Zhu, Yangyong |
Author affiliation(s) | Fudan University and Yunnan University |
Primary contact | Email: lcai at fudan dot edu dot cn |
Year published | 2015 |
Volume and issue | 14 |
Page(s) | 2 |
DOI | 10.5334/dsj-2015-002 |
ISSN | 1683-1470 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | http://datascience.codata.org/articles/10.5334/dsj-2015-002/ |
Download | http://datascience.codata.org/articles/10.5334/dsj-2015-002/galley/550/download/ (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
High-quality data are the precondition for analyzing and using big data and for guaranteeing the value of the data. Currently, comprehensive analysis and research of quality standards and quality assessment methods for big data are lacking. First, this paper summarizes reviews of data quality research. Second, this paper analyzes the data characteristics of the big data environment, presents quality challenges faced by big data, and formulates a hierarchical data quality framework from the perspective of data users. This framework consists of big data quality dimensions, quality characteristics, and quality indexes. Finally, on the basis of this framework, this paper constructs a dynamic assessment process for data quality. This process has good expansibility and adaptability and can meet the needs of big data quality assessment. The research results enrich the theoretical scope of big data and lay a solid foundation for the future by establishing an assessment model and studying evaluation algorithms.
Introduction
Many significant technological changes have occurred in the information technology industry since the beginning of the 21st century, such as cloud computing, the Internet of Things, and social networking. The development of these technologies has made the amount of data increase continuously and accumulate at an unprecedented speed. All the above mentioned technologies announce the coming of big data.[1] Currently, the amount of global data is growing exponentially. The data unit is no longer the gigabyte (GB) and terabyte (TB), but the petabyte (PB; 1 PB = 210 TB), exabyte (EB; 1 EB = 210 PB), and zettabyte (ZB; 1 ZB = 210 EB). According to IDC’s “Digital Universe” forecasts[2], 40 ZB of data will be generated by 2020.
References
- ↑ Meng, X.; Ci, X. (2013). "Big Data Management: Concepts, Techniques and Challenges". Journal of Computer Research and Development 50 (1): 146–169. http://crad.ict.ac.cn/EN/abstract/abstract715.shtml.
- ↑ Gantz, J; Reinsel, D.; Arend, C. (February 2013). "The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East — Western Europe" (PDF). IDC. http://www.emc.com/collateral/analyst-reports/idc-digital-universe-western-europe.pdf.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.