Typical approaches of ranking information in response to a user's query that return the most relevant results ignore important factors contributing to user satisfaction; for instance, the contents of a result document may be redundant given the results already examined. Motivated by emerging applications, in this work we study the problem of Diversity-Aware Search, the essence of which is ranking search results based on both their relevance, as well as their dissimilarity to other results reported. Diversity-Aware Search is generally a hard problem, and even tractable instances thereof cannot be efficiently solved by adapting existing approaches. We propose DivGen . an efficient algorithm for diversity-aware search, which achieves significant performance improvements via novel data access primitives. Although selecting the optimal schedule of data accesses is a hard problem, we devise the first low-overhead data access prioritization scheme with theoretical quality guarantees, and good performance in practice. A comprehensive evaluation on real and synthetic large-scale corpora demonstrates the efficiency and effectiveness of our approach.
展开▼