首页> 外文会议>Scientific and statistical database management >A Truly Dynamic Data Structure for Top-k Queries on Uncertain Data
【24h】

A Truly Dynamic Data Structure for Top-k Queries on Uncertain Data

机译:不确定数据上前k个查询的真正动态数据结构

获取原文
获取原文并翻译 | 示例

摘要

Top-k queries allow end-users to focus on the most important (top-k) answers amongst those which satisfy the query. In traditional databases, a user defined score function assigns a score value to each tuple and a top-k query returns K tuples with the highest score. In uncertain database, top-k answer depends not only on the scores but also on the membership probabilities of tuples. Several top-k definitions covering different aspects of score-probability interplay have been proposed in recent past [20, 13, 6, 18]. Most of the existing work in this research field is focused on developing efficient algorithms for answering top-k queries on static uncertain data. Any change (insertion, deletion of a tuple or change in membership probability, score of a tuple) in underlying data forces re-computation of query answers. Such re-computations are not practical considering the dynamic nature of data in many applications. In this paper, we propose a truly dynamic data structure that uses ranking function PRF~e(α) proposed by Li et al. [18] under the generally adopted model of x-relations [21]. PRF~e can effectively approximate various other top-k definitions on uncertain data based on the value of parameter α. An x-relation consists of a number of x-tuples, where x-tuple is a set of mutually exclusive tuples (up to a constant number) called alternatives. Each x-tuple in a relation randomly instantiates into one tuple from its alternatives. For an uncertain relation with TV tuples, our structure can answer top-k queries in O(k log N) time, handles an update in O(log N) time and takes O(N) space. Finally, we evaluate practical efficiency of our structure on both synthetic and real data.
机译:前k个查询使最终用户可以专注于满足查询条件的最重要的(前k个)答案。在传统数据库中,用户定义的分数函数将分数值分配给每个元组,并且前k个查询返回具有最高分数的K个元组。在不确定的数据库中,top-k答案不仅取决于分数,还取决于元组的隶属概率。近年来,已经提出了覆盖得分-概率相互作用的不同方面的几个前k个定义[20、13、6、18]。该研究领域中的大多数现有工作都集中在开发有效的算法上,以对静态不确定数据进行top-k查询。基础数据中的任何更改(插入,删除元组或成员资格概率变化,元组的分数)都将强制重新计算查询答案。考虑到许多应用程序中数据的动态性质,这种重新计算是不实际的。在本文中,我们提出了一种真正的动态数据结构,该结构使用了Li等人提出的排名函数PRF〜e(α)。 [18]在普遍采用的x关系模型下[21]。 PRF_e可以基于参数α的值有效地近似不确定数据上的其他top-k定义。 x关系由多个x元组组成,其中x元组是一组互斥的元组(最多为常数),称为替代。关系中的每个x元组从其选择中随机实例化为一个元组。对于与电视元组的不确定关系,我们的结构可以在O(k log N)时间内回答前k个查询,在O(log N)时间内处理更新,并占用O(N)空间。最后,我们在综合数据和真实数据上评估结构的实际效率。

著录项

  • 来源
  • 会议地点 Portland OR(US);Portland OR(US)
  • 作者单位

    Computer Science Department, Louisiana State University, Baton Rouge, LA, USA;

    Computer Science Department, Louisiana State University, Baton Rouge, LA, USA;

    Computer Science Department, Louisiana State University, Baton Rouge, LA, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP311.13;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号