首页> 外文期刊>Journal of Sensors >A Type-Based Blocking Technique for Efficient Entity Resolution over Large-Scale Data
【24h】

A Type-Based Blocking Technique for Efficient Entity Resolution over Large-Scale Data

机译:基于基于类型的阻塞技术,用于大规模数据的有效实体分辨率

获取原文
获取原文并翻译 | 示例
       

摘要

In data integration, entity resolution is an important technique to improve data quality. Existing researches typically assume that the target dataset only contain string-type data and use single similarity metric. For larger high-dimensional dataset, redundant information needs to be verified using traditional blocking or windowing techniques. In this work, we propose a novel ER-resolving method using a hybrid approach, including type-based multiblocks, varying window size, and more flexible similarity metrics. In our new ER workflow, we reduce the searching space for entity pairs by the constraint of redundant attributes and matching likelihood. We develop a reference implementation of our proposed approach and validate its performance using real-life dataset from one Internet of Things project. We evaluate the data processing system using five standard metrics including effectiveness, efficiency, accuracy, recall, and precision. Experimental results indicate that the proposed approach could be a promising alternative for entity resolution and could be feasibly applied in real-world data cleaning for large datasets.
机译:在数据集成中,实体分辨率是提高数据质量的重要技术。现有的研究通常假设目标数据集仅包含字符串类型数据并使用单个相似度量。对于较大的高维数据集,需要使用传统的阻塞或窗口技术来验证冗余信息。在这项工作中,我们提出了一种使用混合方法的新型ER解决方法,包括基于类型的多块,不同的窗口大小,更灵活的相似度量。在我们的新的ER工作流程中,我们通过冗余属性的约束来减少实体对的搜索空间和匹配的可能性。我们开发了我们提出的方法的参考实施,并使用一个事物Internet项目使用现实生活数据集进行验证。我们使用五个标准度量评估数据处理系统,包括有效性,效率,准确性,召回和精度。实验结果表明,所提出的方法可能是实体解析的有希望的替代方案,并且可以在大型数据集的真实数据清洁中可行应用。

著录项

  • 来源
    《Journal of Sensors》 |2018年第1期|共12页
  • 作者单位

    Changzhou Univ Sch Informat Sci &

    Engn Changzhou 213164 Peoples R China;

    Changzhou Univ Sch Informat Sci &

    Engn Changzhou 213164 Peoples R China;

    Chinese Acad Sci Xinjiang Tech Inst Phys &

    Chem Urumqi 830011 Peoples R China;

    Chinese Acad Sci Xinjiang Tech Inst Phys &

    Chem Urumqi 830011 Peoples R China;

    Chinese Acad Sci Xinjiang Tech Inst Phys &

    Chem Urumqi 830011 Peoples R China;

    Chinese Acad Sci Xinjiang Tech Inst Phys &

    Chem Urumqi 830011 Peoples R China;

    Chinese Acad Sci Xinjiang Tech Inst Phys &

    Chem Urumqi 830011 Peoples R China;

    Chinese Acad Sci Xinjiang Tech Inst Phys &

    Chem Urumqi 830011 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP212;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号