Difference between revisions of "Journal:Privacy preservation techniques in big data analytics: A survey"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
(Saving and adding more.)
Line 19: Line 19:
|download    = [https://link.springer.com/content/pdf/10.1186%2Fs40537-018-0141-8.pdf https://link.springer.com/content/pdf/10.1186%2Fs40537-018-0141-8.pdf] (PDF)
|download    = [https://link.springer.com/content/pdf/10.1186%2Fs40537-018-0141-8.pdf https://link.springer.com/content/pdf/10.1186%2Fs40537-018-0141-8.pdf] (PDF)
}}
}}
{{ombox
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
Incredible amounts of data are being generated by various organizations like [[hospital]]s, banks, e-commerce, retail and supply chain, etc. by virtue of digital technology. Not only humans but also machines contribute to data streams in the form of closed circuit television (CCTV) streaming, web site logs, etc. Tons of data is generated every minute by social media and smart phones. The voluminous data generated from the various sources can be processed and analyzed to support decision making. However [[Data analysis|data analytics]] is prone to privacy violations. One of the applications of data analytics is recommendation systems, which are widely used by e-commerce sites like Amazon and Flipkart for suggesting products to customers based on their buying habits, leading to inference attacks. Although data analytics is useful in decision making, it will lead to serious privacy concerns. Hence privacy preserving data analytics became very important. This paper examines various privacy threats, privacy preservation techniques, and models with their limitations. The authors then propose a data lake-based modernistic privacy preservation technique to handle privacy preservation in unstructured data.
'''Keywords''': data, data analytics, privacy threats, privacy preservation
==Introduction==
There is exponential growth in the volume and variety of data due to diverse applications of computers in all domain areas. The growth has been achieved due to affordable availability of computer technology, storage, and network connectivity. The large scale data—which also include person specific private and sensitive data like gender, zip code, disease, caste, shopping cart, religion, etc.—is being stored in a variety of public and private domains. The data holder can then release this data to a third-party data analyst to gain deeper insights and identify hidden patterns which are useful in making important decisions that may help in improving businesses and provide value-added services to customers<ref name="DucangeAGlimpse18">{{cite journal |title=A glimpse on big data analytics in the framework of marketing strategies |journal=Soft Computing |author=Ducange, P.; Pecori, R.; Mezzina, P. |volume=22 |issue=1 |pages=325–42 |year=2018 |doi=10.1007/s00500-017-2536-4}}</ref>, as well in activities such as prediction, forecasting, and recommendation.<ref name="ChauhanPrediction17">{{cite journal |title=Prediction of places of visit using tweets |journal=Knowledge and Information Systems |author=Chauhan, A.; Kummamuru, K.; Toshniwal, D. |volume=50 |issue=1 |pages=145–66 |year=2017 |doi=10.1007/s10115-016-0936-x}}</ref> One of the prominent applications of data analytics is the recommendation system, which is widely used by e-commerce sites like Amazon and Flipkart for suggesting products to customers based on their buying habits. Facebook does something similar by suggesting friends, places to visit, and even movies to watch based on our interest. However releasing user activity data may lead to inference attacks like identifying gender based on user activity.<ref name="YangPrivacy18">{{cite journal |title=Privacy-Preserving Social Media Data Publishing for Personalized Ranking-Based Recommendation |journal=IEEE Transactions on Knowledge and Data Engineering |author=Yang, D.; Qu, B.; Cudre-Mauroux, P. |year=2018 |doi=10.1109/TKDE.2018.2840974}}</ref> We have studied a number of privacy preserving techniques which are being employed to protect against privacy threats. Each of these techniques has their own merits and demerits. This paper explores the merits and demerits of each of these techniques and also describes the research challenges in the area of privacy preservation. Always there exists a trade off between data utility and privacy. This paper also proposes a data lake-based modernistic privacy preservation technique to handle privacy preservation in unstructured data with maximum data utility.
==Abbreviations==
* CCTV: closed circuit television
* MDSBA: multidimensional sensitivity-based anonymization


==References==
==References==

Revision as of 20:24, 12 November 2018

Full article title Privacy preservation techniques in big data analytics: A survey
Journal Journal of Big Data
Author(s) Rao, P. Ram Mohan; Krishna, S. Murali; Kumar, A.P. Siva
Author affiliation(s) MLR Institute of Technology, Sri Venkateswara College of Engineering, JNTU Anantapur
Primary contact Email: rammohan04 at gmail dot com
Year published 2018
Volume and issue 5
Page(s) 33
DOI 10.1186/s40537-018-0141-8
ISSN 2196-1115
Distribution license Creative Commons Attribution 4.0 International
Website https://link.springer.com/article/10.1186/s40537-018-0141-8
Download https://link.springer.com/content/pdf/10.1186%2Fs40537-018-0141-8.pdf (PDF)

Abstract

Incredible amounts of data are being generated by various organizations like hospitals, banks, e-commerce, retail and supply chain, etc. by virtue of digital technology. Not only humans but also machines contribute to data streams in the form of closed circuit television (CCTV) streaming, web site logs, etc. Tons of data is generated every minute by social media and smart phones. The voluminous data generated from the various sources can be processed and analyzed to support decision making. However data analytics is prone to privacy violations. One of the applications of data analytics is recommendation systems, which are widely used by e-commerce sites like Amazon and Flipkart for suggesting products to customers based on their buying habits, leading to inference attacks. Although data analytics is useful in decision making, it will lead to serious privacy concerns. Hence privacy preserving data analytics became very important. This paper examines various privacy threats, privacy preservation techniques, and models with their limitations. The authors then propose a data lake-based modernistic privacy preservation technique to handle privacy preservation in unstructured data.

Keywords: data, data analytics, privacy threats, privacy preservation

Introduction

There is exponential growth in the volume and variety of data due to diverse applications of computers in all domain areas. The growth has been achieved due to affordable availability of computer technology, storage, and network connectivity. The large scale data—which also include person specific private and sensitive data like gender, zip code, disease, caste, shopping cart, religion, etc.—is being stored in a variety of public and private domains. The data holder can then release this data to a third-party data analyst to gain deeper insights and identify hidden patterns which are useful in making important decisions that may help in improving businesses and provide value-added services to customers[1], as well in activities such as prediction, forecasting, and recommendation.[2] One of the prominent applications of data analytics is the recommendation system, which is widely used by e-commerce sites like Amazon and Flipkart for suggesting products to customers based on their buying habits. Facebook does something similar by suggesting friends, places to visit, and even movies to watch based on our interest. However releasing user activity data may lead to inference attacks like identifying gender based on user activity.[3] We have studied a number of privacy preserving techniques which are being employed to protect against privacy threats. Each of these techniques has their own merits and demerits. This paper explores the merits and demerits of each of these techniques and also describes the research challenges in the area of privacy preservation. Always there exists a trade off between data utility and privacy. This paper also proposes a data lake-based modernistic privacy preservation technique to handle privacy preservation in unstructured data with maximum data utility.

Abbreviations

  • CCTV: closed circuit television
  • MDSBA: multidimensional sensitivity-based anonymization

References

  1. Ducange, P.; Pecori, R.; Mezzina, P. (2018). "A glimpse on big data analytics in the framework of marketing strategies". Soft Computing 22 (1): 325–42. doi:10.1007/s00500-017-2536-4. 
  2. Chauhan, A.; Kummamuru, K.; Toshniwal, D. (2017). "Prediction of places of visit using tweets". Knowledge and Information Systems 50 (1): 145–66. doi:10.1007/s10115-016-0936-x. 
  3. Yang, D.; Qu, B.; Cudre-Mauroux, P. (2018). "Privacy-Preserving Social Media Data Publishing for Personalized Ranking-Based Recommendation". IEEE Transactions on Knowledge and Data Engineering. doi:10.1109/TKDE.2018.2840974. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added.