Stanford InfoLab Publication Server

Algorithms and Architectures for Data Privacy

Thomas, Dilys (2007) Algorithms and Architectures for Data Privacy. PhD thesis, Stanford University.




Title: Algorithms and Architectures for Data Privacy Abstract: The explosive progress in networking, storage, and processor technologies has resulted in an unprecedented volume of digital data. In concert with this escalating increase in digital data, concerns about privacy of personal information have emerged globally. The ease at which data can be collected automatically, stored in databases and queried efficiently over the internet has worsened the privacy situation, and has raised numerous ethical and legal concerns. Privacy enforcement today is being handled primarily through legislation. We aim to provide technological solutions to achieve a tradeoff between data privacy and data utility. We focus on three problems in the area of database privacy in this thesis. The first problem is that of data sanitization before publication. Publishing health and financial information for research purposes requires the data be anonymized so that the privacy of individuals in the database is protected. This anonymized information can be used as is or can be combined with another (anonymized) dataset that shares columns or rows with the original anonymized dataset. We explore both these scenarios in this thesis. Another reason for sanitization is to give the data to an out-sourced software developer for testing software applications without the out-sourced developer learning information about its client. We briefly explain such a tool in this thesis. The second part of the thesis is auditing query logs for privacy. Given certain forbidden views of a database that must be kept confidential, a batch of SQL queries that were posed over this database, and a definition of suspiciousness, we study the problem to determine whether the batch of queries is suspicious with respect to the forbidden views. The third part of the thesis deals with distributed architectures for data privacy. The advent of databases as an out-sourced service has resulted in privacy concerns on the part of the client storing data with third party database service providers. Previous approaches to enabling such a service have been based on data encryption, causing a large overhead in query processing. In this thesis we provide a distributed architecture for secure database services. We develop algorithms for distributing data and executing queries over this distributed data.

Item Type:Thesis (PhD)
Uncontrolled Keywords:Data Privacy, Data Mining, Algorithms, Anonymity, Distributed Architectures, Clustering
Subjects:Computer Science
Projects:PORTIA (DB-Privacy)
Related URLs:Project Homepage
ID Code:810
Deposited By:Import Account
Deposited On:22 Jul 2007 17:00
Last Modified:10 Dec 2008 17:55

Download statistics

Repository Staff Only: item control page