The probabilistic threshold query is one of the most common queries inuncertain databases, where a result satisfying the query must be also withprobability meeting the threshold requirement. In this paper, we investigateprobabilistic threshold keyword queries (PrTKQ) over XML data, which is notstudied before. We first introduce the notion of quasi-SLCA and use it torepresent results for a PrTKQ with the consideration of possible worldsemantics. Then we design a probabilistic inverted (PI) index that can be usedto quickly return the qualified answers and filter out the unqualified onesbased on our proposed lower/upper bounds. After that, we propose two efficientand comparable algorithms: Baseline Algorithm and PI index-based Algorithm. Toaccelerate the performance of algorithms, we also utilize probability densityfunction. An empirical study using real and synthetic data sets has verifiedthe effectiveness and the efficiency of our approaches.
展开▼