首页> 外文期刊>The VLDB journal >Sampling from repairs of conditional functional dependency violations
【24h】

Sampling from repairs of conditional functional dependency violations

机译:有条件的功能依赖关系违规修复的抽样

获取原文
获取原文并翻译 | 示例
           

摘要

Violations of functional dependencies (FDs) and conditional functional dependencies (CFDs) are common in practice, often indicating deviations from the intended data semantics. These violations arise in many contexts such as data integration and Web data extraction. Resolving these violations is challenging for a variety of reasons, one of them being the exponential number of possible repairs. Most of the previous work has tackled this problem by producing a single repair that is nearly optimal with respect to some metric. In this paper, we propose a novel data cleaning approach that is not limited to finding a single repair, namely sampling from the space of possible repairs. We give several motivating scenarios where sampling from the space of CFD repairs is desirable, we propose a new class of useful repairs, and we present an algorithm that randomly samples from this space in an efficient way. We also show how to restrict the space of repairs based on constraints that reflect the accuracy of different parts of the database. We experimentally evaluate our algorithms against previous approaches to show the utility and efficiency of our approach.
机译:在实践中,违反功能依赖关系(FD)和条件功能依赖关系(CFD)的情况很普遍,通常表明与预期数据语义的偏离。这些冲突发生在许多情况下,例如数据集成和Web数据提取。解决这些违规问题的原因多种多样,其中之一就是可能的维修数量成倍增加。以前的大多数工作都是通过产生相对于某个度量而言几乎最佳的单个修复来解决此问题的。在本文中,我们提出了一种新颖的数据清理方法,该方法不仅限于查找单个维修,即从可能的维修空间中采样。我们提供了一些激励性的方案,其中需要从CFD维修空间进行采样,我们提出了一类有用的维修方法,并提出了一种以有效方式从该空间随机采样的算法。我们还将展示如何根据反映数据库不同部分准确性的约束条件来限制维修空间。我们根据以前的方法对算法进行了实验评估,以显示该方法的实用性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号