Most existing personalization systems rely on site-centric user data, in which the inputs available to the system are the user's behaviors on a specific site. We use a dataset supplied by a major audience measurement company that represents a complete user-centric view of clickstream behavior. Using the supplied product purchase metadata to set up a prediction problem, we learn models of the user's probability of purchase within a time window for multiple product categories by using features that represent the user's browsing and search behavior on all websites. As a baseline, we compare our results to the best such models that can be learned from site-centric data at a major search engine site. We demonstrate substantial improvements in accuracy with comparable and often better recall. A novel behaviorally (as opposed to syntactically) based search term suggestion algorithm is also proposed for feature selection of clickstream data. Finally, our models are not privacy invasive. If deployed client-side, our models amount to a dynamic "smart cookie" that is expressive of a user's individual intentions with a precise probabilistic interpretation.
展开▼