首页> 外文会议>International Conference on Electrical Engineering and Informatics >A review of similarity measurement for record duplication detection
【24h】

A review of similarity measurement for record duplication detection

机译:记录重复检测相似度测量综述

获取原文

摘要

Similarity measurement is a significant process to determine the degree of similarity between two records. This paper presents a comparative analysis of important similarity measurements which are utilised for the detection of duplicated records in databases. The work evaluates their strengths based on the efficiency of prevailing algorithms, the time required to process and identify duplications as well as performance accuracy. The analysis conducted found that among the most common similarity measurements, those based on the Jaro-Winkler algorithm significantly outperformed the other algorithms. This paper presents an enhanced strategy based on the Jaro-Winkler algorithm to improve the detection of similarity among database records. The ability to provide solutions to this problem will greatly enhance the quality of data used in decision-making.
机译:相似性测量是确定两个记录之间相似度的重要过程。本文介绍了对重要的相似性测量的比较分析,用于检测数据库中的重复记录。工作基于普遍算法的效率,处理和识别重复性以及性能准确性所需的时间来评估其优势。进行的分析发现,在最常见的相似性测量中,基于Jaro-Winkler算法的那些显着优于其他算法。本文介绍了基于Jaro-Winkler算法的增强策略,提高了数据库记录之间的相似性的检测。为此问题提供解决方案的能力将大大提高决策中使用的数据质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号