Journal:Secure data outsourcing in presence of the inference problem: Issues and directions

From LIMSWiki
Revision as of 17:25, 6 April 2021 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Secure data outsourcing in presence of the inference problem: Issues and directions
Journal Journal of Information and Telecommunication
Author(s) Jebali, Adel; Sassi, Salma; Jemai, Akderrazak
Author affiliation(s) Tunis El Manar University, Jendouba University, Carthage University
Primary contact Email: adel dot jbali at fst dot utm dot tn
Year published 2020
Volume and issue 5(1)
Article # 16–34
DOI 10.1080/24751839.2020.1819633
ISSN 2475-1847
Distribution license Creative Commons Attribution 4.0 International
Website https://www.tandfonline.com/doi/full/10.1080/24751839.2020.1819633
Download https://www.tandfonline.com/doi/pdf/10.1080/24751839.2020.1819633 (PDF)

Abstract

With the emergence of the cloud computing paradigms, secure data outsourcing—moving some or most data to a third-party provider of secure data management services—has become one of the crucial challenges of modern computing. Data owners place their data among cloud service providers (CSPs) in order to increase flexibility, optimize storage, enhance data manipulation, and decrease processing time. Nevertheless, from a security point of view, access control proves to be a major concern in this situation seeing that the security policy of the data owner must be preserved when data is moved to the cloud. The lack of a comprehensive and systematic review on this topic in the available literature motivated us to review this research problem. Here, we discuss current and emerging research on privacy and confidentiality concerns in cloud-based data outsourcing and pinpoint potential issues that are still unresolved.

Keywords: cloud computing, data outsourcing, access control, inference leakage, secrecy and privacy

Introduction

In light of the increasing volume and variety of data from diverse sources—e.g., from health systems, social insurance systems, scientific and academic data systems, smart cities, and social networks—in-house storage and processing of large collections of data has becoming very costly. Hence, modern database systems have evolved from a centralized storage architecture to a distributed one, and with it the database- as-a-service paradigm has emerged. Data owners are increasingly moving their data to cloud service providers (CSPs) in order to increase flexibility, optimize storage, enhance data manipulation, and decrease processing times. Nonetheless, security concerns are widely recognized as a major barrier to cloud computing and other data outsourcing or database-as-a-service arrangements. Users remain reluctant to place their sensitive data in the cloud due to concerns about data disclosure to potentially untrusted external parties and other malicious parties.[1] Being processed and stored externally, data owners feel they have little control over their sensitive data, consequently putting data privacy at risk. From this perspective, access control is a major challenge seeing that the security policy of a data owner must be preserved when data is moved to the cloud. Access control policies are enforced by CSPs by keeping some sensitive data separated from each other.[2] However, some techniques like encryption are helpful to better guarantee the confidentiality of data.[3][4][5] The intent of encryption is to break sensitive associations among outsourced data by encrypting some attributes of that data. However, other data security concerns exist as well. Security breaches in distributed cloud databases could be exacerbated due to inference leakage, which occurs when a malicious actor uses information from a legitimate public response to discover more sensitive information, often from metadata. During the last two decades, researchers have devoted significant effort to enforcing access control policies and privacy protection requirements externally while maintaining a balance with data utility.[6][7][8][9][10][11][12]

In this paper, we review the current and emerging research on privacy and confidentiality concerns in data outsourcing and highlight research directions in this field. In summary, our systematic review addresses security concerns in cloud database systems for both communicating and non-communicating servers. We also survey this research field in relation to the inference problem and the unresolved problems that are introduced. Recognizing these challenges, this paper provides an overview of our proposed (because this is an ongoing work) solution. The crux of that solution is to firstly optimize data distribution without the need to query the workload, then partition the database in the cloud by taking into consideration access control policies and data utility, before finally running a query evaluation model on a big data framework to securely process distributed queries while retaining access control.

The reminder of this paper is organized as follows. The next section describes the literature review methodology adopted in this paper. After that, we review emerging research on data outsourcing in the context of privacy concerns and data utility. Then we discuss data outsourcing in relation to the inference problem. Afterwards, we introduce our proposed solution to implement a secure distributed cloud database on a big data framework (Apache Spark). We close with future research directions and challenges, as well as our final conclusions.

Literature review methodology

The methodology for literature review adopted in this paper follows the checklist proposed by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.[13] It includes, as shown in Figure 1, three steps: input literature, processing steps, and review output.


Fig1 Jebali JofInfoTelec2020 5-1.jpg

Figure 1. Three stages of the literature review process.

Input literature

In this section we describe selected literature and their selection process. Firstly, our advance keyword research was conducted on the Google Scholar search engine, with a time filter from January 1, 1990 to December 31, 2019. Table 1 lists keywords used in different Google Scholar queries.

Table 1. Keywords used in our literature review search.
Keyword Number of viewed papers
Access control, Data outsourcing 43
Cloud computing, Authorization policies 78
Database, inference leakage 33
Confidentiality constraints, Cloud database 41
Secure data integration 11
Big data, Distributed query processing 39
Privacy, data publishing 24

The logical operator used between keywords during search was the "And" operator. Finally, from the 269 viewed papers, 43 articles were retained for review. Figure 2 shows their distribution by publication year.


Fig2 Jebali JofInfoTelec2020 5-1.jpg

Figure 2. Distribution of the 43 articles retained for review, by publication year.

Processing steps

During the review, papers were processed by identifying the problem, understanding the proposed solution process, and listing the important findings. We summarized and compared each paper with the papers associated with the similar problem. Then for each processed paper, three or four critical sentences was introduced to highlight the limits and specify potential directions that may be followed to enhance the proposed approaches. Based on our literature review, we classified the data outsourcing and access control papers into three categories, as shown in Figure 3. The first category of papers addresses the problem of secure data outsourcing when the servers in the cloud are unaware of each other. The second category addresses secure data outsourcing where interaction between servers exists and how this later can aggravate the situation. In the last category, we address data outsourcing in relation to the inference problem, as this later can exploit semantic constraints to bypass authorization policies at the cloud level.


Fig3 Jebali JofInfoTelec2020 5-1.jpg

Figure 3. Classification scheme of the literature retained for review.

Review output

The outcome of the methodological review process is presented later in this paper as our proposed solution. In the "Proposed solution" section, we present an incremental approach composed of three steps, each step treating one of the three problems mentioned in the previous subsection. We believe that our proposed solution is capable of providing good results compared to other reviewed approaches. Afterwards, we report other potential future research areas and challenges.

Preserving confidentiality in data outsourcing scenarios

There is a consensus in the security research community about the efficiency of data outsourcing for solving data management problems.[2] This consists of moving data from in-house storage to cloud databases, while also maintaining a balance between data confidentiality and utility (Figure 4).


Fig4 Jebali JofInfoTelec2020 5-1.jpg

Figure 4. A representation of secure data outsourcing.

CSPs are considered honest-but-curious: the database servers answer user queries correctly and do not manage stored data, but they attempt to intelligently analyze data and queries in order to learn as much information as possible from them.

Two powerful techniques have been proposed to enforce access control in cloud databases. The first technique exploits vertical database fragmentation to keep some sensitive data separated from each other. The second technique resorts to encryption to make a single attribute invisible to unauthorized users. These two techniques can be implemented using the following approaches:

  • Full outsourcing: The entire in-house database is moved to the cloud. It considers vertical database fragmentation to enforce confidentiality constraints with more than two attributes by keeping them separated from each other among distributed servers. Moreover, it resorts to encryption in order to hide confidentiality constraints with a single attribute.[6]
  • "Keep a few": This approach departs from encryption by involving the owner side. The attributes to be encrypted are stored in plain text on the owner side since this later is considered a trusted part. The rest of the database is distributed among servers while maintaining data confidentiality through vertical fragmentation.[11]

Aside from the fact that encrypting data for storing them externally carries a considerable cost[2], previous studies have primarily concentrated on non-communicating cloud servers.[6][10][11][12][14] In this situation, servers are unaware of each other and do not exchange any information. When a master node receives a query, it decomposes it and processes it locally without the need to perform a join query. In recent years, researchers have studied the effect of communication between servers on query execution, and secure query evaluation strategies have been elaborated.[4][15][16][17]

In the rest of this section, we discuss current and emerging research efforts in each of the three mentioned architectures.

Secure data outsourcing with non-communicating servers

In 2005, Aggarwal et al. presented some of the earliest research attempting to enforce access control in database outsourcing using vertical fragmentation.[6] Under the assumption that servers do not communicate, the work aimed to split the database on two untrusted servers while preserving data privacy, with some of the attributes possibly encrypted. They demonstrated a secure fragmentation of a relation R is a triple (F1 , F2 , E) where F1 , F2 contain attributes in plain text stored in different servers and E is the set of encrypted attributes. The tuple identifier and the encrypted attributes were replicated with each fragment. The protection measures were also augmented by a query evaluation technique defining how queries on the original table can be transformed into queries on the fragmented table.

The work of Hudic et al.[18] introduces an approach to enforce confidentiality and privacy while outsourcing data to a CSP. The proposed technique relies on vertical fragmentation and applies only minimal encryption to preserve data exposure to malicious parties. However, the fragmentation algorithm enforces the database logic schema to be in a third normal form to produce a good fragmentation design, and the query execution cost was not proven to be minimal.

In 2007, Ciriani et al.[5] addressed the problem of privacy preserving data outsourcing by resorting to the combination of fragmentation and encryption. The former is exploited to break sensitive associations between attributes, while the latter enforces the privacy of singleton confidentiality constraints. The authors go on to define a formal model of minimal fragmentation and propose a heuristic minimal fragmentation algorithm to efficiently execute queries over fragments while preserving security properties. However, when a query executed over a fragment involves an attribute that is encrypted, an additional query is executed to evaluate the conditions of the attribute, leading to performance degradation by slowing down query processing.

In 2011, Ciriani et al.[19] addressed the concept of secure data publishing in the presence of confidentiality and visibility constraints. By modelling these two constraints as Boolean formulas and fragment as complete truth assignments, the authors rely on the Ordered Binary Decision Diagrams (OBDD) technique to check whether a fragmentation satisfies confidentiality and visibility constraints. The proposed algorithm runs using OBDD and returns a fragmentation that guarantees correctness and minimality. However, query execution cost was not investigated in this paper, and the algorithm runs only on a database schema with a single relation.

Xu et al.[1] studied the problem of finding secure fragmentation with minimum cost for query support in 2015. Firstly, they define the cost of a fragmentation F as the sum of the cost of each query Qi executed on F multiplied by the execution frequency of Qi. Secondly, they resort to using a heuristic local search graph-based approach to obtain near optimal fragmentation. The search space was modelled as a fragmentation graph, and transformation between fragmentation as a set of edges E. Then, two search strategies where proposed: a static search strategy, which is invariant with the number of steps in a solution path, as well as a dynamic search strategy based on guided local search, which guarantees the safeness of the final solution while avoiding a dead-end. However, this paper does not investigate visibility constraints, which is an important concept for data utility. Moreover, other heuristic search techniques could have been addressed (e.g., Tabu search or simulated annealing).

The 2009 work of Ciriani et al.[11] puts forward a new paradigm to securely publishing data in the cloud while completely departing from encryption, since encryption is sometimes considered a very rigid tool that is delicate in its configuration, while potentially slowing down query processing. The idea behind this work is to engage the owner side (assumed to be a trusted party) to store a limited portion of data (that is supposed to be encrypted) in the clear and use vertical fragmentation to break sensitive associations among data to be stored in the cloud. Their proposed algorithm computes a fragmentation solution that minimizes the load for the data owner while guaranteeing privacy concerns. Moreover, authors highlight other metrics that can be used to characterize the quality of a fragmentation and decide which attribute is affected to the client side and which attribute is externalized. However, engaging the client to enforce access control requires mediating every query in the system, which could lead to bottlenecks and negatively impact performance.

In 2017, Bollwein and Wiese[8] proposed a separation of duties technique based on vertical fragmentation to address the problem of preserving confidentiality when outsourcing data to a CSP. To ensure privacy requirements were met, confidentiality constraints and data dependencies were introduced. The separation of duties problem was treated as an optimization problem to maximize the utility of the fragmented database and to enhance query execution over the distributed servers. However, the optimization problem was addressed only from the point of minimizing the number of distributed servers. Additionally, when collaboration between servers is established, the separation of duties approach is no longer efficient to preserve confidentiality constraints. The NP-hardness proofness of the separation of duties problem discussed in Bollwein and Wiese[8] was later proven by the authors the following year.[9] The separation of duties problem was addressed as an optimization problem by the combination of the two famous NP-hardness problems: bin packing and vertex coloring. The bin packing problem was introduced to take into consideration the capacity constraints of the servers, with the view that fragments should be placed in a minimum number of servers without exceeding the maximum capacity. Meanwhile, vertex coloring was introduced to enforce confidentiality constraints, seeing that the association of certain attributes in the same server violates confidentiality propriety. We should note, however, that this paper studies the separation of duties problem for single-relation databases, and to make the theory applicable in practical scenarios, a many-relations database should be used.

Keeping in mind the fact that communication between distributed servers in data outsourcing scenarios exacerbates privacy concerns, secure query evaluation strategies should be adopted. In the next subsection we investigate prior research on secure data outsourcing with communicating servers.

Secure data outsourcing: The case of communicating servers

References

  1. 1.0 1.1 Xu, X.; Xiong, L.; Liu, J. (2015). "Database Fragmentation with Confidentiality Constraints: A Graph Search Approach". Proceedings of the 5th ACM Conference on Data and Application Security and Privacy: 263–70. doi:10.1145/2699026.2699121. 
  2. 2.0 2.1 2.2 Samarati, P.; di Vimarcati, S.D.C. (2010). "Data protection in outsourcing scenarios: Issues and directions". Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security: 1–14. doi:10.1145/1755688.1755690. 
  3. Biskup, J.; Preuß, M. (2013). "Database Fragmentation with Encryption: Under Which Semantic Constraints and A Priori Knowledge Can Two Keep a Secret?". Data and Applications Security and Privacy XXVIII: 17–32. doi:10.1007/978-3-642-39256-6_2. 
  4. 4.0 4.1 Bkakria, A.; Cuppens, F.; Cuppens-Boulahia, N. et al. (2013). "Preserving Multi-relational Outsourced Databases Confidentiality using Fragmentation and Encryption". JoWUA 4 (2): 39–62. doi:10.22667/JOWUA.2013.06.31.039. 
  5. 5.0 5.1 Ciriani, V.; di Vimaercati, S.D.C.; Foresti, S. et al. (2007). "Fragmentation and Encryption to Enforce Privacy in Data Storage". Computer Security - ESORICS 2007: 171–86. doi:10.1007/978-3-540-74835-9_12. 
  6. 6.0 6.1 6.2 6.3 Aggarwal, G.; Bawa, M.; Ganesan, P. et al. (2005). "Two Can Keep a Secret: A Distributed Architecture for Secure Database Services". Second Biennial Conference on Innovative Data Systems Research: 1–14. http://ilpubs.stanford.edu:8090/659/. 
  7. Alsirhani, A.; Bodorik, P. Sampalli, S. (2017). "Improving Database Security in Cloud Computing by Fragmentation of Data". Proceedings of the 2017 International Conference on Computer and Applications: 43–49. doi:10.1109/COMAPP.2017.8079737. 
  8. 8.0 8.1 8.2 Bollwein, F.; Wiese, L. (2017). "Separation of Duties for Multiple Relations in Cloud Databases as an Optimization Problem". Proceedings of the 21st International Database Engineering & Applications Symposium: 98–107. doi:10.1145/3105831.3105873. 
  9. 9.0 9.1 Bollwein, F.; Wiese, L. (2018). "On the Hardness of Separation of Duties Problems for Cloud Databases". Proceedings of TrustBus 2018: Trust, Privacy and Security in Digital Business: 23–38. doi:10.1007/978-3-319-98385-1_3. 
  10. 10.0 10.1 Ciriani, V.; di Vimercati, S.D.C.; Foresti, S. et al. (2009). "Fragmentation Design for Efficient Query Execution over Sensitive Distributed Databases". Proceedings of the 29th IEEE International Conference on Distributed Computing Systems: 32–39. doi:10.1109/ICDCS.2009.52. 
  11. 11.0 11.1 11.2 11.3 Ciriani, V.; di Vimercati, S.D.C.; Foresti, S. et al. (2009). "Keep a Few: Outsourcing Data While Maintaining Confidentiality". Computing Security - ESORICS 2009: 440–55. doi:10.1007/978-3-642-04444-1_27. 
  12. 12.0 12.1 di Vimercati, S.D.C.; Foresti, S.; Jajodia, S. et al. (2014). "Fragmentation in Presence of Data Dependencies". IEEE Transactions on Dependable and Secure Computing 11 (6): 510–23. doi:10.1109/TDSC.2013.2295798. 
  13. Moher, D.; Liberti, A.; Tetzlaff, J. et al. (2009). "Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement". PLoS One 6 (7): e1000097. doi:10.1371/journal.pmed.1000097. PMC PMC2707599. PMID 19621072. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2707599. 
  14. di Vimercati, S.D.C.; Foresti, S.; Jajodia, S. et al. (2010). "Fragments and loose associations: respecting privacy in data publishing". Proceedings of the VLDB Endowment 3 (1–2). doi:10.14778/1920841.1921009. 
  15. Bkakria, A.; Cuppens, F.; Cuppens-Boulahia, N. et al. (2013). "Confidentiality-Preserving Query Execution of Fragmented Outsourced Data". Proceeding of ICT-EurAsia 2013: Information and Communication Technology 4 (2): 426-440. doi:10.1007/978-3-642-36818-9_47. 
  16. di Vimarcati, S.D.C.; Foresti, S.; Jajodia, S. et al. (2016). "Efficient integrity checks for join queries in the cloud". Journal of Computer Security 24 (3): 347–78. doi:10.3233/JCS-160545. 
  17. di Vimarcati, S.D.C.; Foresti, S.; Jajodia, S. et al. (2013). "Integrity for join queries in the cloud". IEEE Transactions on Cloud Computing 1 (2): 187–200. doi:10.1109/TCC.2013.18. 
  18. Hudic, A.; Islam, S.; Kieseberg, P. et al. (2013). "Data confidentiality using fragmentation in cloud computing". International Journal of Pervasive Computing and Communications 9 (1): 37–51. doi:10.1108/17427371311315743. 
  19. Ciriani, V.; di Vimercati, S.D.C.; Foresti, S. et al. (2011). "Enforcing Confidentiality and Data Visibility Constraints: An OBDD Approach". Proceeding of DBSec 2011: Data and Applications Security and Privacy XXV: 44–59. doi:10.1007/978-3-642-22348-8_6. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added. The original paper listed references alphabetically; this wiki lists them by order of appearance, by design.