首页> 外文期刊>Expert systems with applications >Using positional sequence patterns to estimate the selectivity of SQL LIKE queries
【24h】

Using positional sequence patterns to estimate the selectivity of SQL LIKE queries

机译:使用位置序列模式来估计SQL的选择性,如查询

获取原文
获取原文并翻译 | 示例

摘要

Sequence patterns are frequently employed in many expert system applications in a wide range of domains from bioinformatics to smart homes and stock market analysis. Regular sequence patterns fail to express whether two consecutive items in a pattern are occurring right after each other in all pattern occurrences in an item database or not. Such a differentiation may be important for many intelligent system applications, for instance, to better address business questions like "should two frequently-bought together items be located right next to each other on a retail store shelf, or is it ok to place them at some distance as long as they are in the same aisle?". In this paper, we propose a novel type of sequence pattern, called "positional sequence patterns", and illustrate its application on a special expert system, i.e., the query planner/optimizer of a database management system. Positional sequence patterns allow to accommodate extra information regarding whether a frequent ordered item pair always occurs next to each other without any gap in between in all pattern occurrences. Since positional sequence patterns are not considered by the existing sequence pattern mining algorithms, we also propose an algorithm to mine them. Next, we integrate the positional sequence patterns into the selectivity estimation component of the query optimizer as an expert system application. More specifically, in the knowledgebase of the query optimizer, a histogram-like structure of positional sequence patterns are created and stored. Then, during query optimization time, these histograms are utilized to infer the selectivity of flexible text queries that are enabled by the SQL LIKE operator. In particular, the proposed selectivity estimation method employs redundant pattern elimination based on pattern information content during histogram construction, and a partitioning-based matching scheme. The experimental results on a real dataset from DBLP show that the proposed approach outperforms the state of the art by around 20% improvement in error rates. (C) 2020 Elsevier Ltd. All rights reserved.
机译:在许多专家系统应用中经常采用序列模式,这些应用在来自生物信息学到智能家庭和股票市场分析的各种域中。常规序列模式未能表达在项目数据库中的所有模式出现中彼此发生在模式中的两个连续项是否正确发生。这种差异对于许多智能系统应用可能是重要的,例如,为了更好地寻址商业问题,如“应该在零售店货架上彼此彼此正确的两个经常购买的项目,或者可以放置它们一定距离只要它们在同一个过道?“。在本文中,我们提出了一种新颖的序列模式,称为“位置序列模式”,并说明其在特殊专家系统上的应用,即数据库管理系统的查询计划者/优化器。位置序列模式允许容纳有关频繁排序的项目对是否始终彼此发生的额外信息,而不会在所有模式出现之间的任何间隙。由于现有的序列模式挖掘算法不考虑位置序列模式,因此我们还提出了一种算法来挖掘它们。接下来,我们将位置序列模式集成到查询优化器的选择性估计分量中作为专家系统应用程序。更具体地,在查询优化器的知识库中,创建和存储位置序列模式的直方图结构。然后,在查询优化时间期间,这些直方图用于推断由SQL等操作员启用的灵活文本查询的选择性。特别地,所提出的选择性估计方法基于直方图结构期间的模式信息内容,以及基于分区的匹配方案,采用冗余模式消除。 DBLP的真实数据集的实验结果表明,所提出的方法优于误差率的大约20%的提高大约20%。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号