Privacy preservation is realized by transforming data into k-anonymous (k-anonymization) and l-diverse (l-diversification) versions while minimizing information loss. Frequency l-diversity is possibly the most practical instance of the generic l-diversity principle for privacy preservation. In this paper, we propose an algorithm for frequency l-diversification. Our primary objective is to minimize information loss. Most studies in privacy preservation have focused on k-anonymization. While simple principles of l-diversification algorithms can be obtained by adapting k-anonymization algorithms it is not straightforward for some other principles. Our algorithm, called Bucket Clustering, adapts k-member Clustering. However, in order to guarantee termination we use hashing and buckets as in the Anatomy algorithm. In order to minimize information loss we choose tuples that minimize information loss during the creation of clusters. We empirically show that our algorithm achieves low information loss with acceptable efficiency.
展开▼