Cleansing uncertain databases leveraging aggregate constraints

机译：利用聚合约束清理不确定的数据库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.

机译：新兴的不确定性数据库应用程序通常涉及不确定性数据库的清理（调节），使用附加信息作为减少不确定性的新证据。但是，不幸的是，过去对条件概率数据库的研究仅集中在功能依赖性上。在实际应用中，可以以汇总约束的形式获取有关不确定数据集的大多数其他信息（例如，汇总结果在线发布用于各种统计目的）。因此，如果可以考虑这些汇总约束，则可以大大减少数据集中的不确定性。但是，找到一种利用总约束以减少不确定性的实用方法是一个非常具有挑战性的问题。在本文中，我们提出了三种通过使用聚合约束来清理（条件）不确定数据库的方法。由于问题是NP难题，因此我们将两种问题近似化，将问题建模为非线性优化问题，然后利用模拟退火（SA）和进化算法（EA）从可能世界的整个解空间中进行采样。为了支持那些具有更高概率并同时满足所有约束的可能世界，我们定义了满意度函数（SDF），然后相应地构建了目标函数。随后，基于样本结果，我们删除重复项，重新标准化所有合格可能世界的概率，并得出后验概率数据库。我们的实验验证了算法的效率和有效性，并表明我们的近似方法可以很好地扩展到大型数据库。

著录项

来源
《Data Engineering Workshops (ICDEW), 2010》|2010年|128-135|共8页
会议地点
作者
Haiquan Chen; Wei-Shinn Ku; Haixun Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据处理、数据处理系统;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient Retrieval of Aggregate Data in Uncertain and Probabilistic Databases [J] . Mr.V.V.Kheradkar, Prof.U.L.Kulkarni International Journal of Engineering Trends and Technology . 2014,第8期

机译：不确定和概率数据库中总数据的有效检索
2. Leveraged Funds and the Shadow Cost of Leverage Constraints [J] . Lu Zhongjin, Qin Zhongling Journal of Finance . 2021,第3期

机译：杠杆资金和杠杆限制的影子成本
3. Type-1 OWA operators for aggregating uncertain information with uncertain weights induced by type-2 linguistic quantifiers [J] . Shang-Ming Zhou, Francisco Chiclana, Robert I. John, Fuzzy sets and systems . 2008,第24期

机译：Type-1 OWA运算符，用于聚合由type-2语言量词引起的不确定权重的不确定信息
4. Cleansing uncertain databases leveraging aggregate constraints [C] . Haiquan Chen, Wei-Shinn Ku, Haixun Wang Data Engineering Workshops (ICDEW), 2010 . 2010

机译：利用聚合约束清理不确定的数据库
5. Improving database performances in a changing environment with uncertain and dynamic information demand: An intelligent database system approach. [D] . Chen, Andrew Nai-Kuang. 1999

机译：在不确定的动态信息需求下，不断变化的环境中提高数据库性能：一种智能数据库系统方法。
6. Leveraging splice-affecting variant predictors and a minigene validation system to identify Mendelian disease-causing variants amongst exon-captured variants of uncertain significance [O] . Zachry T. Soens, Justin Branch, Shijing Wu, -1

机译：利用影响剪接的变体预测因子和微型基因验证系统在不确定显着性的外显子捕获变体中鉴定引起孟德尔疾病的变体
7. Querying uncertain data with aggregate constraints [O] . Mohan Yang 2011

机译：使用聚合约束查询不确定数据

Cleansing uncertain databases leveraging aggregate constraints

摘要

著录项

相似文献

相关主题

期刊订阅