首页> 外文会议>International Conference on Extending Database Technology >Approximate substring selectivity estimation
【24h】

Approximate substring selectivity estimation

机译:近似子串选择性估计

获取原文

摘要

We study the problem of estimating selectivity of approximate substring queries. Its importance in databases is ever increasing as more and more data are input by users and are integrated with many typographical errors and different spelling conventions. To begin with, we consider edit distance for the similarity between a pair of strings. Based on information stored in an extended N-gram table, we propose two estimation algorithms, MOF and LBS for the task. The latter extends the former with ideas from set hashing signatures. The experimental results show that MOF is a light-weight algorithm that gives fairly accurate estimations. However, if more space is available, LBS can give better accuracy than MOF and other baseline methods. Next, we extend the proposed solution to other similarity predicates, SQL LIKE operator and Jaccard similarity.
机译:我们研究估计近似子字符串查询的选择性的问题。随着用户输入越来越多的数据并将其与许多印刷错误和不同的拼写约定集成在一起,它在数据库中的重要性不断提高。首先,我们考虑编辑距离以实现一对字符串之间的相似性。基于扩展N-gram表中存储的信息,我们针对任务提出了两种估计算法:MOF和LBS。后者通过设置散列签名的思想扩展了前者。实验结果表明,MOF是一种轻量级算法,可提供相当准确的估计。但是,如果有更多可用空间,则LBS可以提供​​比MOF和其他基线方法更好的准确性。接下来,我们将提出的解决方案扩展到其他相似性谓词,SQL LIKE运算符和Jaccard相似性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号