There is a need to strike a balance between the pursuit of personalized services based on a fine-grained behavioral analysis and the user privacy concerns. In this paper, we consider the use of web traces with truncated URLs, where each URL is trimmed to only contain the web domain, to remove sensitive user information. In order to offset the accuracy loss in user activity profiling due to URL truncation, we propose a statistical methodology that leverages specialized features extracted from a burst of consecutive URLs representing a micro user action. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. On a real dataset of mobile web traces, consisting of more than 130 million records and 10,000 users, we show that our methodology achieves around 90% accuracy in segregating URLs representing user activities from nonrepresentative URLs.
展开▼