首页> 外文会议>International conference on database systems for advanced applications >Drawing Density Core-Sets from Incomplete Relational Data
【24h】

Drawing Density Core-Sets from Incomplete Relational Data

机译:从不完整的关系数据绘制密度核心集

获取原文

摘要

Incompleteness is a ubiquitous issue and brings challenges to answer queries with completeness guaranteed. A density core-set is a subset of an incomplete dataset, whose completeness is approximate to the completeness of the entire dataset. Density core-sets are effective mechanisms to estimate completeness of queries on incomplete datasets. This paper studies the problems of drawing density core-sets on incomplete relational data. To the best of our knowledge, there is no such proposal in the past. (1) We study the problems of drawing density core-sets in different requirements, and prove the problems are all NP-Complete whether functional dependencies are given. (2) An efficient approximate algorithm to draw an approximate density core-set is proposed, where an approximate Knapsack algorithm and weighted sampling techniques are employed to select important candidate tuples. (3) Analysis of the proposed approximate algorithm shows the relative error between completeness of the approximate density core-set and that of a density core-set with same size is within a given relative error bound with high probability. (4) Experiments on both real-world and synthetic datasets demonstrate the effectiveness and efficiency of the algorithm.
机译:不完整性是一个无处不在的问题,并带来挑战,以确保完整性回答查询。密度核心集是不完整数据集的子集,其完整性近似于整个数据集的完整性。密度核心集是有效的机制,以估算不完整数据集查询的完整性。本文研究了在不完全关系数据上绘制密度核心集的问题。据我们所知,过去没有这样的建议。 (1)我们研究了不同要求的绘制密度核心集的问题,并证明了所有NP完成的问题是否给出了功能依赖性。 (2)提出了一种有效的绘制近似密度核心集的近似算法,其中采用近似背包算法和加权采样技术来选择重要的候选元组。 (3)提出的近似算法的分析显示了近似密度核心集的完整性之间的相对误差,并且具有相同尺寸的密度核心集的相对误差在具有高概率的给定相对误差中。 (4)实际和合成数据集的实验证明了算法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号