首页> 外文会议>International Conference on Scalable Uncertainty Management >Evaluating Indeterministic Duplicate Detection Results
【24h】

Evaluating Indeterministic Duplicate Detection Results

机译:评估不确定的重复检测结果

获取原文
获取外文期刊封面目录资料

摘要

Duplicate detection is an important process for cleaning or integrating data. Since real-life data is often polluted, detecting duplicates usually comes along with uncertainty. To handle duplicate uncertainty in an appropriate way, indeterministic duplicate detection approaches, i.e. approaches in which ambiguous duplicate decisions are probabilistically modeled in the resultant data, have been developed. To rate the goodness of a duplicate detection approach, its detection results need to be evaluated in their quality. In this paper, we propose several semantics to apply traditional quality evaluation measures to indeterministic duplicate detection results and exemplarily present an efficient evaluation for one of these semantics. Finally, we present some experimental results.
机译:重复检测是清洁或集成数据的重要过程。由于实际数据经常污染,因此检测重复通常与不确定性发生。以适当的方式处理重复的不确定性,不确定的重复检测方法,即,已经开发出在所得数据中概率模拟的透明重复决策的方法。为了评估重复检测方法的良好,需要在其质量中进行检测结果。在本文中,我们提出了几种语义,将传统质量评估措施应用于不确定的重复检测结果,并示出了对这些语义之一的有效评估。最后,我们展示了一些实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号