The publication of Web search logs is very useful for the scientific research community, but to preserve the users' privacy, logs have to be submitted to an anonymization process. Random query swapping is a common technique used to protect logs that provides k-anonymity to the users in exchange for loss of utility. With the assumption that by swapping queries semantically close this utility loss can be reduced, we introduce a novel protection method that semantically microaggregates the logs using the Open Directory Project. That is, we extend a common method used in statistical disclosure control to protect search logs from a semantic perspective. The method has been tested with a random subset of AOL search logs, and it has been observed that new logs improve the data usefulness.
展开▼