首页> 外文学位 >Efficient query processing for uncertain data.
【24h】

Efficient query processing for uncertain data.

机译:对不确定数据的高效查询处理。

获取原文
获取原文并翻译 | 示例

摘要

Applications with uncertain data pose many challenges for data management and query processing. This dissertation advances the state of the art for efficient query processing over uncertain data. We study three types of probabilistic queries: nearest-neighbor queries, skyline queries and the general select-project-join queries, all of which could leverage a probability threshold for pruning such that only results that satisfy the query with probabilities over the given threshold are returned. For nearest-neighbor queries, we design novel indexes and data structures to monitor the pruning status and uncover pruning opportunities. For skyline queries, we propose two filtering schemes to quickly identify interesting instances whose skyline probabilities are over the threshold: i) by bounding an instance's skyline probability, and ii) by comparing the instance with others based on dominance relationship. In applications of skyline analysis where "thresholding'' is not desirable, we propose the problem of computing all skyline probabilities and for the first time present two worst-case sub-quadratic algorithms for it. We further give an efficient algorithm to solve the online version of the problem. Finally, we study the general select-project-join (SPJ) queries under the Orion uncertainty model and propose optimization rules to leverage the threshold for early pruning of unqualified tuples. We also extend our study to SPJ queries with duplicate elimination. We adopt a general tuple uncertainty model for this case and design new techniques for handling duplicate elimination. Our experiments on various data sets show that our techniques are both effective and efficient.
机译:数据不确定的应用程序对数据管理和查询处理提出了许多挑战。本文提出了对不确定数据进行有效查询处理的最新技术。我们研究了三种概率查询:最近邻查询,天际线查询和常规select-project-join查询,所有这些查询都可以利用概率阈值进行修剪,以便仅满足概率超过给定阈值的查询的结果才是回。对于最邻近的查询,我们设计了新颖的索引和数据结构来监视修剪状态并发现修剪机会。对于天际线查询,我们提出了两种过滤方案,以快速识别天际线概率超过阈值的有趣实例:i)通过限制实例的天际线概率来定界; ii)通过基于优势关系将实例与其他实例进行比较。在不需要“阈值”的天际线分析应用中,我们提出了计算所有天际线概率的问题,并首次提出了两种最坏情况的次二次算法,并进一步给出了一种有效的在线求解算法。最后,我们研究了Orion不确定性模型下的常规select-project-join(SPJ)查询,并提出了优化规则以利用阈值来对不合格元组进行早期修剪,并且还将研究扩展到具有重复项的SPJ查询。在这种情况下,我们采用通用的元组不确定性模型,并设计了用于消除重复项的新技术,我们在各种数据集上进行的实验表明,我们的技术既有效又高效。

著录项

  • 作者

    Qi, Yinian.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 181 p.
  • 总页数 181
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号