首页> 外文OA文献 >Top-K Queries on Uncertain Data: On Score Distribution and Typical Answers
【2h】

Top-K Queries on Uncertain Data: On Score Distribution and Typical Answers

机译:不确定数据的前K个查询:得分分布和典型答案

摘要

Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.
机译:不确定的数据出现在许多领域,包括数据集成和传感器网络。根据用户定义的分数对结果进行排名的前k个查询是探索大型不确定数据集的重要工具。正如最近的几篇论文所观察到的那样,由于在报告高分元组和极有可能存在于结果数据集中的元组之间进行权衡,对不确定数据的前k个查询的语义可能是模棱两可的。在本文中,我们证明了需要呈现前k个向量的分数分布,以便用户可以沿着该分数概率维度在结果之间进行选择。一种选择是显示所有潜在的前k个元组向量的完整分布,但是此集合太大而无法计算。取而代之的是,我们建议提供许多可有效采样此分布的典型向量。我们提出了有效的算法来计算这些向量。我们还将语义和算法扩展到得分关系的场景,这在该领域的先前工作中没有涉及。我们的工作包括对真实数据集和综合数据集进行系统的实证研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号