Journal:DAQUA-MASS: An ISO 8000-61-based data quality management methodology for sensor data

From LIMSWiki
Revision as of 00:22, 2 April 2019 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title DAQUA-MASS: An ISO 8000-61-based data quality management methodology for sensor data
Journal Sensors
Author(s) Perez-Castillo, Ricardo; Carretero, Ana G.; Caballero, Ismael; Rodriguez, Moises;
Piattini, Mario; Mate, Alejandro; Kim, Sunho; Lee, Dongwoo
Author affiliation(s) University of Castilla-La Mancha, AQC Lab, University of Alicante, Myongji University, GTOne,
Primary contact Email: ricardo dot pdelcastillo @ uclm dot es
Year published 2018
Volume and issue 18(9)
Page(s) 3105
DOI 10.3390/s18093105
ISSN 1424-8220
Distribution license Creative Commons Attribution 4.0 International
Website https://www.mdpi.com/1424-8220/18/9/3105/htm
Download https://www.mdpi.com/1424-8220/18/9/3105/pdf (PDF)

Abstract

The internet of things (IoT) introduces several technical and managerial challenges when it comes to the use of data generated and exchanged by and between various smart, connected products (SCPs) that are part of an IoT system (i.e., physical, intelligent devices with sensors and actuators). Added to the volume and the heterogeneous exchange and consumption of data, it is paramount to assure that data quality levels are maintained in every step of the data chain/lifecycle. Otherwise, the system may fail to meet its expected function. While data quality (DQ) is a mature field, existing solutions are highly heterogeneous. Therefore, we propose that companies, developers, and vendors should align their data quality management mechanisms and artifacts with well-known best practices and standards, as for example, those provided by ISO 8000-61. This standard enables a process-approach to data quality management, overcoming the difficulties of isolated data quality activities. This paper introduces DAQUA-MASS, a methodology based on ISO 8000-61 for data quality management in sensor networks. The methodology consists of four steps according to the Plan-Do-Check-Act cycle by Deming.

Keywords: data quality; data quality management processes; ISO 8000-61; data quality in sensors; internet of things; IoT; smart, connected products; SCPs

Introduction

“Our economy, society, and survival aren’t based on ideas or information—they’re based on things.”[1] This is one of the core foundations of the internet of things (IoT) as stated by Ashton, who coined the term. IoT is an emerging global internet-based information architecture facilitating the exchange of goods and services.[2] IoT systems are inherently built on data gathered from heterogeneous sources in which the volume, variety, and velocity of data generation, exchanging and processing are dramatically increasing.[3] Furthermore, there is a certain emergence of IoT semantic-oriented vision which needs ways to represent and manipulate the vast amount of raw data expected to be generated from and exchanged between the “things.”[4]

The vast amount of data in IoT environments, gathered from a global-scale deployment of smart-things, is the basis for making intelligent decisions and providing better services (e.g., smart mobility, as presented by Zhang et al.[5]). In other words, data represents the bridge that connects cyber and physical worlds. Despite of its tremendous relevance, if data are of inadequate quality, decisions from both humans and other devices are likely to be unsound.[6][7] As a consequence, data quality (DQ) has become one of the key aspects in IoT.[6][8][9][10] IoT devices, and in particular smart, connected products (SCPs), have concrete characteristics that favor the apparition of problems due to inadequate levels of data quality. Mühlhäuser[11] defines SCPs as “entities (tangible object, software, or service) designed and made for self-organized embedding into different (smart) environments in the course of its lifecycle, providing improved simplicity and openness through improved connections.” While some of the SCP-related characteristics might be considered omnipresent (i.e., uncertain, erroneous, noisy, distributed, and voluminous), other characteristics are more specific and highly dependent on the context and monitored phenomena (i.e., smooth variation, continuous, correlation, periodicity, or Markovian behavior).[6]

Also, outside of the IoT research area, DQ has been broadly studied during last years, and it has become a mature research area capturing the growing interest of the industry due to the different types of values that companies can extract from data.[12] This fact is reflected by the standardization efforts like ISO/IEC 25000 series addressing systems and software quality requirements and evaluation (SQuaRE)[13] processes, and specific techniques for managing data concerns. We pose that such standards can be tailored and used within the IoT context, not only bring benefits standardizing solutions and enabling a better communication between partners. Also, the number of problems and system fails on the IoT environment is reduced, better decisions can be taken due to a better quality of data, all stakeholders are aligned and can take benefit of the advances on the standard used, and it is easier to apply data quality solutions in a global way because the heterogeneity is reduced.

Due to the youth of IoT, and despite DQ standards, frameworks, management techniques, and tools proposed in the literature, DQ for IoT has not been yet widely studied. However, and prior to this research line, it is possible to cite some works that had addressed some DQ concerns in sensor wireless networks[8][14], or in data streaming[15][16] among other proposals.[6] However, these works have not considered the management of DQ in a holistic way in line with existing DQ-related standards. In our attempt to align the study of DQ in IoT to international standards, this paper provides practitioners and researchers with DAQUA-MASS, a methodology for managing data quality in SCP environments, which considers some of the DQ best practices for improving quality of data in SCP environments aligned to ISO 8000-61.[17] Due to the intrinsic distributed nature of IoT systems, using such standards will enable the various organizations to be aligned to the same foundations, and in the end, to work in a seamless way, what will undoubtedly improve the performance of the business processes.

The remainder of this paper is organized as follows: the next section presents the most challenging data quality management concerns in the context of the SCP environments; afterwards. related work is explored. Then the data quality model in which our methodology is based on is presented. The last two sections propose a methodology for managing data quality in SCP environments and discuss conclusions and implications of this work.

Data quality challenges in SCP environments

This section introduces some general ideas about SCPs and their operation as an essential part of IoT. In addition, some challenges related to DQ in SCP environments are also introduced.

According to Cook and Das[18], a smart environment is a small world where all kinds of smart devices are continuously working to make inhabitants’ lives more comfortable. According to Mühlhäuser[11], SCP provides intelligent actions through improved connections by means of context-awareness, semantic self-description, proactive behavior, multimodal natural interfaces, AI planning, and machine learning.

SCPs have three main core components: physical, smart, and connectivity components. Smart components extend the capabilities and value of the physical components, while connectivity extends the capabilities and value of the smart components. This enables some smart components to exist outside the physical product itself, with a cycle of value improvement.[19]

IoT and SCP can be confused in some contexts. However, IoT simply reflects the growing number of SCPs and highlights the new opportunities they can represent. IoT, which can involve people or things, is a means for interchanging information. What makes SCPs essentially different is not the "internet," but the changing nature of the “things.”[19] A product that is smart and connected to the cloud could become part of an interconnected management solution, and companies can therefore evolve from making products to offering more complex, higher-value services within a “system of systems.”[20]

SCPs include processors, sensors, software, and connectivity that allow data to be exchanged between the product and its environment. The data collected by sensors of these SCPs can be then analysed to inform decision-making, enable operational efficiencies, and continuously improve the performance of the product. This paper focuses on the data produced by such sensors, and how inadequate levels of data quality may affect the processing of the data, while smart and connectivity parts of SCPs are outside of the scope of this paper.

SCPs can be connected in large, complex networks throughout three different layers[9]: acquisition, processing, and utilization layers (see Figure 1).


Fig1 Perez-Castillo Sensors2018 18-9.png

Figure 1. Layers in SCP environments.

  • The acquisition layer refers to the sensor data collection system where sensors, raw (or sensed) data, and pre-processed data are managed. This is the main focus of this paper.
  • The processing layer involves data resulting from the data processing and management center, where energy, storage, and analysis capabilities are more significant.
  • The utilization layer concerns delivered data (or post-processed data) exploited, for example, over a geographic information system (GIS) or combined with other services or applications.

As previously stated, the scope of the paper is limited to the data produced by SCPs’ sensors. Hence, the proposal is mainly intended to be applied in the context of the acquisition layer. Nevertheless, the management of data quality in sensors can impact on how data is processed (processing layer) and how data may be used later (utilization layer).

Networking and management of SCP operations can generate the business intelligence needed to deliver smart services. Smart services are delivered to or via smart objects that feature awareness and connectivity.[21] SCP can carry out the following functions to support smart services[22]: status, diagnostics, upgrades, control and automation, profiling and behavior tracking, replenishment and commerce, location mapping. and logistics, among others.

SCP operations enable new capabilities for companies, although new problems and challenges that arise must also be taken into account. On one hand, SCP operations require companies to build and support an entirely new technology infrastructure.[19] Technological layers in the new technology landscape include new product hardware, embedded software, connectivity, a product cloud running on remote servers, security tools, gateway for external information sources, and integration with enterprise business systems. On the other hand, SCP operations can provide competitive advantages, which are based on the operational effectiveness. Operation effectiveness requires the embrace of best practices along the value chain, including up-to-date product technologies, the latest production equipment, and state-of-the-art sales force methods, IT solutions, and so forth. Thus, SCP operations also creates new best practices across the value chain.[19]

According to the different sources of data in these SCP environments, we can distinguish different types of aggregated data:

  • Sensor data: data that is generated by sensors and digitized in a computer-readable format (for example, camera sensor readings)
  • Device data: integrated by sensor data; observed metadata (metadata that characterizes the sensor data, e.g., timestamp of sensor data); and device meta data (metadata that characterizes the device, e.g., device model, sensor model, manufacturer, etc.), so device data, for example, can be data coming from the camera (device)
  • General data: data related to/or coming from devices which has been modified or computed to derive different data plus business data (i.e., data for business use such as operation, maintenance, service, customers, etc.)
  • IoT data: general data plus device data

A reduction in the levels of quality of these data due to different problems in SCP operations can threaten the success factors of SCP environments.[6] The quality of produced data is often affected by dysfunctional SCP devices and sensors, which are the sources providing data, and can potentially result in inadequate levels of quality that are only detected later on, when data are being processed and used. Therefore, while we can identify dysfunctional SCP devices through the analysis of sensor data by using data quality management techniques, it is noteworthy that these devices will impact the rest of the sensor network. According to[6], Table 1 summarizes some of these SCP factors that, in some cases, could condition or lead to data quality issues. In addition, the three columns on the right of Table 1 show (marked with a cross) the most critical layers affected in a greater extent by every SCP factor.

Table 1. SCP factors that can finally affect the levels of DQ according to Karkouch et al.[6]
SCP Factor Side Effect in Data Quality Acquisition Processing Utilization
Deployment scale SCPs are expected to be deployed on a global scale. This leads to a huge heterogeneity in data sources (not only computers but also daily objects). Also, the huge number of devices accumulates the chance of error occurrence. X X
Resource constraints For example, computational and storage capabilities that do not allow complex operations due, in turn, to the battery-power constraints among others. X X
Network Intermittent loss of connection in the IoT is recurrent. Things are only capable of transmitting small-sized messages due to their scarce resources. X X
Sensors Embedded sensors may lack precision or suffer from loss of calibration or even low accuracy. Faulty sensors may also result in inconsistencies in data sensing. X
Environment SCP devices will not be deployed only in tolerant and less aggressive environments. To monitor some phenomenon, sensors may be deployed in environments with extreme conditions. Data errors emerge when the sensor experiences the surrounding environment influences.[23] X X
Vandalism Things are generally defenseless from outside physical threats (both from humans and animals). X X
Fail-dirty A sensor node fails, but it keeps up reporting readings which are erroneous. It is a common problem for SCP networks and an important source of outlier readings. X X
Privacy Privacy preservation processing, thus DQ could be intentionally reduced. X
Security vulnerability Sensor devices are vulnerable to attack, e.g., it is possible for a malicious entity to alter data in an SCP device. X X
Data stream processing Data gathered by smart things are sent in the form of streams to the back-end pervasive applications which make use of them. Some stream processing operators could affect quality of the underlying data.[10] Other important factors are data granularity and variety.[24] Granularity concerns interpolation and spatio-temporal density while variety refers to interoperability and dynamic semantics. X X

References

  1. Ashton, K. (2009). "That 'Internet of Things' Thing". RFID Journal 22: 97–114. 
  2. Weber, R.H. (2013). "Internet of things – Governance quo vadis?". Computer Law & Security Review 29 (4): 341-347. doi:10.1016/j.clsr.2013.05.010. 
  3. Hassanein, H.S.; Oteafy, S.M.A. (2017). "Big Sensed Data Challenges in the Internet of Things". Proceedings from the 13th International Conference on Distributed Computing in Sensor Systems: 207–8. doi:10.1109/DCOSS.2017.35. 
  4. Atzori, L.; Iera, A.; Morabito, G. (2010). "The Internet of Things: A survey". Computer Networks 54 (15): 2787-2805. doi:10.1016/j.comnet.2010.05.010. 
  5. Zhang, W.; Zhang, Z.; Chao, H.-C. (2017). "Cooperative Fog Computing for Dealing with Big Data in the Internet of Vehicles: Architecture and Hierarchical Resource Management". IEEE Communications Magazine 55 (12): 60–7. doi:10.1109/MCOM.2017.1700208. 
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 Karkouch, A.; Mousannif, H.; Al Moatassime, H. et al. (2016). "Data quality in internet of things: A state-of-the-art survey". Journal of Network and Computer Applications 73: 57–81. doi:10.1016/j.jnca.2016.08.002. 
  7. Merino, J.; Caballero, I.; Rivas, B. et al. (2016). "A Data Quality in Use model for Big Data". Future Generation Computer Systems 63: 123–30. doi:10.1016/j.future.2015.11.024. 
  8. 8.0 8.1 Jesus, G.; Casimiro, A.; Oliveira, A. (2017). "A Survey on Data Quality for Dependable Monitoring in Wireless Sensor Networks". Sensors 17 (9): E2010. doi:10.3390/s17092010. PMC PMC5620495. PMID 28869505. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5620495. 
  9. 9.0 9.1 Rodríguez, C.C.G.; Servigne, S. (2017). "Managing Sensor Data Uncertainty: A Data Quality Approach". International Journal of Agricultural and Environmental Information Systems 4 (1): 3. doi:10.4018/jaeis.2013010103. 
  10. 10.0 10.1 Klein, A.; Hackenbroich, G.; Lehner, W. (2009). "How to Screen a Data Stream - Quality-Driven Load Shedding in Sensor Data Streams". Proceedings of the 2009 International Conference on Information Quality: 1–15. http://mitiq.mit.edu/iciqpapers.aspx?iciqyear=2009. 
  11. 11.0 11.1 Mühlhäuser, M. (2007). "Smart Products: An Introduction". Proceedings from the 2007 European Conference on Ambient Intelligence: 158–64. doi:10.1007/978-3-540-85379-4_20. 
  12. Laney, Douglas B. (2017). Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage. Routledge. ISBN 9781138090385. 
  13. "ISO/IEC 25000:2014". International Organization for Standardization. March 2014. https://www.iso.org/standard/64764.html. Retrieved 13 September 2018. 
  14. Qin, z.; Han, Q.; Mehrotra, S. et al. (2014). "Quality-Aware Sensor Data Management". In Ammari, H.M.. The Art of Wireless Sensor Networks. Springer. pp. 429–64. ISBN 9783642400094. 
  15. Campbell, J.L.; Rustad, L.E.; Porter, J.H. et al. (2013). "Quantity is Nothing without Quality: Automated QA/QC for Streaming Environmental Sensor Data". BioScience 63 (7): 574–85. doi:10.1525/bio.2013.63.7.10. 
  16. Klein, A.; Lehner, W. (2009). "Representing Data Quality in Sensor Data Streaming Environments". Journal of Data and Information Quality 1 (2): 10. doi:10.1145/1577840.1577845. 
  17. "ISO 8000-61:2016". International Organization for Standardization. November 2016. https://www.iso.org/standard/63086.html. Retrieved 13 September 2018. 
  18. Cook, D.; Das, S.K. (2004). Smart Environments: Technology, Protocols and Applications (1st ed.). Wiley-Interscience. ISBN 9780471544487. 
  19. 19.0 19.1 19.2 19.3 Porter, M.E.; Heppelmann, J.E. (2014). "How Smart, Connected Products Are Transforming Competition". Harvard Business Review 92: 64–88. https://hbr.org/2014/11/how-smart-connected-products-are-transforming-competition. 
  20. Ostrower, D. (2014). "Smart Connected Products: Killing Industries, Boosting Innovation". Wired. https://www.wired.com/insights/2014/11/smart-connected-products/. Retrieved 13 September 2019. 
  21. Wuenderlich, N.V.; Heinonen, K.; Ostrom, A.L. et al. (2015). "“Futurizing” smart service: Implications for service researchers and managers". Journal of Services Marketing 29 (6/7): 442–47. doi:10.1108/JSM-01-2015-0040. 
  22. Allmendinger, G.; Lombreglia, R. (2005). "Four Strategies for the Age of Smart Services". Harvard Business Review 83: 131. https://hbr.org/2005/10/four-strategies-for-the-age-of-smart-services. 
  23. Tilak, S.; Abu-Ghazaleh, N.B.; Heinzelman, W. (2002). "A taxonomy of wireless micro-sensor network models". ACM SIGMOBILE Mobile Computing and Communications Review 6 (2): 28–36. doi:10.1145/565702.565708. 
  24. Barnaghi, P.; Bermudez-Edo, M.; Tönjes, R. (2015). "Challenges for Quality of Data in Smart Cities". Journal of Data and Information Quality 6 (2–3): 6. doi:10.1145/2747881. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added.