The problem of finding the best answers to a query quickly, rather than finding all answers, is of increasing importance as relational databases are applied in multimedia and decision-support domains. An approach to efficiently answering such "Top N" queries is to augment the query with an additional selection that prunes away the unwanted portion of the answer set. The risk is that if the selection returns fewer than the desired number of answers, the execution must be restarted (with a less selective filter). We propose a new, probabilistic approach to query optimization that quantifies this risk and seeks to minimize overall cost including the cost of possible restarts. We also present an extensive experimental study to demonstrate that probabilistic Top N query optimization can significantly reduce the average query execution time with relatively modest increases in the optimization time.
展开▼