The aim of privacy-preserving data mining is to construct highly accurate predictive models while not disclosing privacy information. Aggregation functions, such as sum and count are often used to pre-process the data prior to applying data mining techniques to relational databases. Often, it is implicitly assumed that the aggregated (or summarized) data are less likely to lead to privacy violations during data mining. This paper investigates this claim, within the relational database domain. We introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. Our experimental results show that aggregation potentially introduces new privacy violations. That is, potentially harmful attributes obtained with aggregation are often different from the ones obtained from non-aggregated databases. This indicates that, even when privacy is enforced on non-aggregated data, it is not automatically enforced on the corresponding aggregated data. Consequently, special care should be taken during model building in order to fully enforce privacy when the data are aggregated.
展开▼