首页> 外文期刊>International journal of computational vision and robotics >Efficient indexing techniques for record matching and deduplication
【24h】

Efficient indexing techniques for record matching and deduplication

机译:用于记录匹配和重复数据删除的高效索引技术

获取原文
获取原文并翻译 | 示例

摘要

Record matching works on large sets of data, which may be either from single database or several databases. As size of database increases very rapidly, demand of matching process becomes too high. So, there is demand to minimise the number of matching pair records, time and cost in comparing records using efficient matching techniques. Recent researches have been done on record matching by number of researchers using various indexing techniques but as such they are not effective. Suffix array (SA) and q-gram are used indexing technique, but they lack somewhere in computation. This paper proposes two new indexing techniques: inverse suffix array (ISA) and Burrows-Wheeler transformation (BWT) to improve the performance of record matching process. The approach ISA can handle the multiple keywords simultaneously. We compare the performance of the proposed techniques with existing suffix array and q-gram indexing techniques and found that the new techniques are better than the earlier techniques.
机译:记录匹配适用于大型数据集,这些数据集可能来自单个数据库,也可能来自多个数据库。随着数据库规模的迅速增加,匹配过程的需求变得过高。因此,需要使用有效的匹配技术来最小化匹配对记录的数量,时间和成本。最近的研究已经通过使用各种索引技术的研究人员的数量来进行记录匹配,但是它们并不有效。后缀数组(SA)和q-gram用于索引技术,但它们在计算中缺少位置。本文提出了两种新的索引技术:反后缀数组(ISA)和Burrows-Wheeler变换(BWT),以提高记录匹配过程的性能。 ISA方法可以同时处理多个关键字。我们将提出的技术与现有的后缀数组和q-gram索引技术进行比较,发现新技术比早期技术更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号