首页> 外文会议>International conference on very large data bases >Domain-Aware Multi-Truth Discovery from Conflicting Sources
【24h】

Domain-Aware Multi-Truth Discovery from Conflicting Sources

机译:来自冲突源的领域感知多事实发现

获取原文

摘要

In the Big Data era, truth discovery has served as a promising technique to solve conflicts in the facts provided by numerous data sources. The most significant challenge for this task is to estimate source reliability and select the answers supported by high quality sources. However, existing works assume that one data source has the same reliability on any kinds of entity, ignoring the possibility that a source may vary in reliability on different domains. To capture the influence of various levels of expertise in different domains, we integrate domain expertise knowledge to achieve a more precise estimation of source reliability. We propose to infer the domain expertise of a data source based on its data richness in different domains. We also study the mutual influence between domains, which will affect the inference of domain expertise. Through leveraging the unique features of the multi-truth problem that sources may provide partially correct values of a data item, we assign more reasonable confidence scores to value sets. We propose an integrated Bayesian approach to incorporate the domain expertise of data sources and confidence scores of value sets, aiming to find multiple possible truths without any supervision. Experimental results on two real-world datasets demonstrate the feasibility, efficiency and effectiveness of our approach.
机译:在大数据时代,真理发现已成为一种有前途的技术,可以解决众多数据源提供的事实中的冲突。这项任务面临的最大挑战是估算信号源的可靠性,并选择高质量信号源所支持的答案。但是,现有工作假设一个数据源在任何种类的实体上具有相同的可靠性,而忽略了一个源在不同域上的可靠性可能发生变化的可能性。为了捕获不同领域的不同专业知识水平的影响,我们集成了领域专业知识,以实现对源可靠性的更精确估计。我们建议根据数据源在不同域中的数据丰富程度来推断其专业知识。我们还研究了领域之间的相互影响,这将影响领域专业知识的推论。通过利用源可能提供数据项的部分正确值的多事实问题的独特功能,我们为值集分配了更合理的置信度得分。我们提出一种集成的贝叶斯方法,将数据源的领域专业知识和值集的置信度得分相结合,旨在在没有任何监督的情况下找到多个可能的真相。在两个真实世界的数据集上的实验结果证明了我们方法的可行性,效率和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号