...
首页> 外文期刊>Computational Intelligence >Leveraging active learning to reduce human effort in the generation of ground-truth for entity resolution
【24h】

Leveraging active learning to reduce human effort in the generation of ground-truth for entity resolution

机译:利用积极的学习来减少人类努力,在实体决议的基础真理中努力

获取原文
获取原文并翻译 | 示例
           

摘要

SummarySeveral methods of entity resolution (ER) have been developed in academia and industry over the years, with the intention to identify duplicate entities (eg, records) in datasets. To evaluate the efficacy of such methods, it is necessary to compare their results with a ground‐truth, which consists of a document containing all known duplicate record pairs in a dataset. In general, the generation of ground‐truths for real datasets is performed manually from the inspection of all combinations of pairs of records in a dataset. This is subject to error and presents quadratic complexity, with respect to the size(s) of the dataset(s), requiring a long time to be performed. In this context, some works present (semi)automatic approaches for the generation of ground‐truths for the ER task. However, such approaches are either not applicable to several domains or still present a considerable manual effort. In this work, we propose GTGenERAL, a semiautomatic approach that combines results from multiple algorithms of ER together with active learning to generate accurate ground‐truths employing reduced manual effort. Experiments using real datasets show that, with great manual effort reduction, GTGenERAL is able to generate ground‐truths close to those generated by the state‐of‐the‐art approach.
机译:多年来,在学术界和工业中,已经在学术界和工业中开发了实体分辨率(ER)的摘要方法,目的是识别数据集中的重复实体(例如,记录)。为了评估此类方法的功效,有必要将其结果与地面真理进行比较,该结果由包含在数据集中的所有已知的重复记录对的文档组成。通常,从数据集中的所有记录对的所有组合的检查中手动地执行真实数据集的地面真理。这是错误的,并且关于数据集的大小(s)的大小,需要执行二次复杂性,需要执行长时间。在这种情况下,一些作品(半)为er任务产生地面真理的自动方法。但是,这种方法不适用于若干域或仍然存在相当大的手动努力。在这项工作中,我们提出了Gtgeneral,一种半自动方法,将多个算法的结果与主动学习结合在一起,以产生采用减少手动努力的准确地面真理。使用实际数据集的实验表明,随着减少的巨大手动努力,GTGeneral能够产生靠近最先进的方法产生的地面真理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号