【24h】

Fast Algorithms for Top-k Approximate String Matching

机译:Top-k近似字符串匹配的快速算法

获取原文

摘要

Top-k approximate querying on string collections is an important data analysis tool for many applications, and it has been exhaustively studied. However, the scale of the problem has increased dramatically because of the prevalence of the Web. In this paper, we aim to explore the efficient top-k similar string matching problem. Several efficient strategies are introduced, such as length aware and adaptive q-gram selection. We present a general q-gram based framework and propose two efficient algorithms based on the strategies introduced. Our techniques are experimentally evaluated on three real data sets and show a superior performance.
机译:对字符串集合的top-k近似查询是许多应用程序的重要数据分析工具,并且已经进行了详尽的研究。但是,由于Web的普及,问题的规模已急剧增加。在本文中,我们旨在探索高效的前k个相似字符串匹配问题。引入了几种有效的策略,例如长度感知和自适应q-gram选择。我们提出了一个基于q-gram的通用框架,并根据引入的策略提出了两种有效的算法。我们的技术在三个真实数据集上进行了实验评估,并显示出卓越的性能。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号