首页> 外文会议>International conference on database systems for advanced applications >Drawing Density Core-Sets from Incomplete Relational Data
【24h】

Drawing Density Core-Sets from Incomplete Relational Data

机译:从不完整的关系数据中绘制密度核心集

获取原文

摘要

Incompleteness is a ubiquitous issue and brings challenges to answer queries with completeness guaranteed. A density core-set is a subset of an incomplete dataset, whose completeness is approximate to the completeness of the entire dataset. Density core-sets are effective mechanisms to estimate completeness of queries on incomplete datasets. This paper studies the problems of drawing density core-sets on incomplete relational data. To the best of our knowledge, there is no such proposal in the past. (1) We study the problems of drawing density core-sets in different requirements, and prove the problems are all NP-Complete whether functional dependencies are given. (2) An efficient approximate algorithm to draw an approximate density core-set is proposed, where an approximate Knapsack algorithm and weighted sampling techniques are employed to select important candidate tuples. (3) Analysis of the proposed approximate algorithm shows the relative error between completeness of the approximate density core-set and that of a density core-set with same size is within a given relative error bound with high probability. (4) Experiments on both real-world and synthetic datasets demonstrate the effectiveness and efficiency of the algorithm.
机译:不完整性是一个普遍存在的问题,它在保证完整性的前提下回答查询带来了挑战。密度核心集是不完整数据集的子集,其完整性近似于整个数据集的完整性。密度核心集是评估不完整数据集查询完整性的有效机制。本文研究了在不完整的关系数据上绘制密度核集的问题。据我们所知,过去没有这样的提议。 (1)研究了在不同需求下绘制密度核集的问题,并证明了这些问题都是NP-Complete的,无论是否给出了功能依赖性。 (2)提出了一种有效的近似算法来绘制近似密度核集,其中采用近似背包算法和加权采样技术来选择重要的候选元组。 (3)对提出的近似算法的分析表明,近似密度核集的完整性和具有相同大小的密度核集的完整性之间的相对误差在给定的相对误差范围内具有很高的概率。 (4)在真实数据集和综合数据集上的实验证明了该算法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号