...
首页> 外文期刊>Information Sciences: An International Journal >Efficient processing of substring match queries with inverted variable-length gram indexes
【24h】

Efficient processing of substring match queries with inverted variable-length gram indexes

机译:具有反相可变长度克索引的子字符串匹配查询的高效处理

获取原文
获取原文并翻译 | 示例

摘要

With the widespread use of the internet, text-based data sources have become ubiquitous and the demand for effective support of string matching queries continues to increase. The relational query language SQL supports LIKE clauses for string data to handle substring matching queries. Due to the popularity of such substring matching queries, there have been many studies on designing efficient indexes to support the LIKE clause in SQL. Among them, the indexes based on q-grams with fixed or variable sizes have been studied extensively. In this paper, we show that the optimal processing of substring matching queries with inverted variable-length gram indexes should be decided judiciously. However, the search space of finding the optimal variable-length gram set among those available is exponential with the number of grams. To reduce the search space, we present effective pruning strategies which do not sacrifice optimality. Based on the cost estimation, we propose the optimal algorithms to find the best plan with the minimum cost for substring matching queries using these pruning strategies. We also provide the approximate algorithms to overcome the exponential nature of search space to find an optimal query plan. Our performance study confirms that the proposed techniques improve query execution time for substring matching significantly compared to the running time of the traditional algorithms.
机译:随着互联网的广泛使用,基于文本的数据源已经变得无处不在,对串匹配查询的有效支持的需求继续增加。关系查询语言SQL支持字符串数据的子条件,以处理子字符串匹配查询。由于这种子字符串匹配查询的普及,有很多关于设计有效索引以支持SQL的子句的研究。其中,已经广泛研究了基于具有固定或可变尺寸的Q克的指标。在本文中,我们表明,应明智地决定具有反相可变长度克索引的子字符串匹配查询的最佳处理。但是,在可用的那些中找到最佳变量长度克定的搜索空间是指数与克的数量。为了减少搜索空间,我们提出了没有牺牲最佳性的有效修剪策略。根据成本估算,我们提出了最佳算法,以找到最佳计划,其中使用这些修剪策略的子字符串匹配查询的最低成本。我们还提供近似算法来克服搜索空间的指数性质,以找到最佳查询计划。我们的性能研究证实,与传统算法的运行时间相比,所提出的技术改善了对子字符串匹配的查询执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号