Distributed Privacy Preserving Data Collection using Cryptographic Techniques

Mingqiang, Xue and Papadimitriou, Panagiotis and Raissi, Cheddy and Kalnis, Panos and Pung, Hung Keng (2009) Distributed Privacy Preserving Data Collection using Cryptographic Techniques. Technical Report. Stanford InfoLab.


PDF - Submitted for Publication


We study the distributed $k$-anonymous data collection problem: a data collector (e.g., a medical research institute) wishes to collect data (e.g., medical records) from a group of respondents (e.g., patients). Each respondent owns a multi-attributed record which contains both non-sensitive (e.g., quasi-identifiers) and sensitive information (e.g., a particular disease), and submits it to the data collector. Assuming $T$ is the table formed by all the respondent data records, we say that the data collection process is $k$-anonymous if it allows the data collector to obtain a $k$-anonymized version of $T$ without revealing the original records to any adversary. In contrast to most $k$-anonymization approaches which trust the data collector, our work assumes that the adversary can be any third party, including the data collector and the other responders. We propose a distributed data collection protocol that outputs a $k$-anonymized table by generalization of quasi-identifier attributes. The protocol employs cryptographic techniques such as homomorphic encryption, private information retrieval and secure multiparty computation to ensure the privacy goal in the process of data collection. Meanwhile, the protocol is designed to leak limited but non-critical information (mainly statistical information about the non-sensitive attributes of the data respondents) to achieve practicability and efficiency. Experiments show that the utility of the $k$-anonymized table derived by our protocol is in par with the utility achieved by traditional $k$-anonymization techniques that trust the data collector.

