Drawing Density Core-Sets from Incomplete Relational Data

机译：从不完整的关系数据中绘制密度核心集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Incompleteness is a ubiquitous issue and brings challenges to answer queries with completeness guaranteed. A density core-set is a subset of an incomplete dataset, whose completeness is approximate to the completeness of the entire dataset. Density core-sets are effective mechanisms to estimate completeness of queries on incomplete datasets. This paper studies the problems of drawing density core-sets on incomplete relational data. To the best of our knowledge, there is no such proposal in the past. (1) We study the problems of drawing density core-sets in different requirements, and prove the problems are all NP-Complete whether functional dependencies are given. (2) An efficient approximate algorithm to draw an approximate density core-set is proposed, where an approximate Knapsack algorithm and weighted sampling techniques are employed to select important candidate tuples. (3) Analysis of the proposed approximate algorithm shows the relative error between completeness of the approximate density core-set and that of a density core-set with same size is within a given relative error bound with high probability. (4) Experiments on both real-world and synthetic datasets demonstrate the effectiveness and efficiency of the algorithm.

机译：不完整性是一个普遍存在的问题，它在保证完整性的前提下回答查询带来了挑战。密度核心集是不完整数据集的子集，其完整性近似于整个数据集的完整性。密度核心集是评估不完整数据集查询完整性的有效机制。本文研究了在不完整的关系数据上绘制密度核集的问题。据我们所知，过去没有这样的提议。（1）研究了在不同需求下绘制密度核集的问题，并证明了这些问题都是NP-Complete的，无论是否给出了功能依赖性。（2）提出了一种有效的近似算法来绘制近似密度核集，其中采用近似背包算法和加权采样技术来选择重要的候选元组。（3）对提出的近似算法的分析表明，近似密度核集的完整性和具有相同大小的密度核集的完整性之间的相对误差在给定的相对误差范围内具有很高的概率。（4）在真实数据集和综合数据集上的实验证明了该算法的有效性和效率。

著录项

来源
《International conference on database systems for advanced applications》|2017年|527-542|共16页
会议地点
作者
Yongnan Liu; Jianzhong Li; Hong Gao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data quality; Density core-sets; Incomplete data; Query completeness estimation;

机译：数据质量;密度芯组;数据不完整;查询完整性估计;

相似文献

外文文献
中文文献
专利

1. Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm [J] . Richard J.Hathaway, James C.Bezdek Pattern recognition letters . 2002,第1a3期

机译：使用非欧式关系模糊c均值算法对不完整的关系数据进行聚类
2. BIM INTEROPERABILITY AND RELATIONAL DATABASES INTELLIGENTLY LINKING DRAWINGS AND DATA [J] . Architectural record . 2011,第11期

机译：BIM互操作性和关系数据库将图纸和数据智能地链接在一起
3. Leveraging Node Attributes for Incomplete Relational Data [J] . He Zhao, Lan Du, Wray Buntine JMLR: Workshop and Conference Proceedings . 2017,第3期

机译：利用节点属性获取不完整的关系数据
4. Drawing Density Core-Sets from Incomplete Relational Data [C] . Yongnan Liu, Jianzhong Li, Hong Gao International conference on database systems for advanced applications . 2017

机译：从不完整的关系数据绘制密度核心集
5. A relational model for incomplete information in temporal databases [D] . Nair, Sunil S. 1993

机译：时间数据库中不完整信息的关系模型
6. Relational Database Structure to Manage High-Density Tissue Microarray Data and Images for Pathology Studies Focusing on Clinical Outcome [O] . Sargum Manley, Neil R. Mucci, Angelo M. De Marzo, 2001

机译：关系数据库结构用于管理针对临床结果的病理研究的高密度组织微阵列数据和图像
7. An approach to extending the relational database model for handling incomplete information and data dependencies. [O] . Hồ Thuần, Hồ Cẩm Hà 2012

机译：扩展用于处理不完整信息和数据依赖性的关系数据库模型的方法。
8. Problem of Incomplete Information in Relational Databases [R] . Grahne, G. 1989

机译：关系数据库中不完整信息的问题

Drawing Density Core-Sets from Incomplete Relational Data

摘要

著录项

相似文献

相关主题

期刊订阅