首页> 外文会议>International Conference on Data Warehousing and Knowledge Discovery >Processing Sequential Patterns in Relational Databases
【24h】

Processing Sequential Patterns in Relational Databases

机译:处理关系数据库中的顺序模式

获取原文

摘要

Database integration of data mining has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation since the prohibitive nature of the cost associated with extracting knowledge, as well as the lack of suitable declarative query language support. Recent studies have found that for association rule mining and sequential pattern mining with carefully tuned SQL formulations it is possible to achieve performance comparable to systems that cache the data in files outside the DBMS. However most of the previous pattern mining methods follow the method of Apriori which still encounters problems when a sequential database is large and/or when sequential patterns to be mined are numerous and long. In this paper, we present a novel SQL based approach that we recently proposed, called Prospad (PROjection Sequential PAttern Discovery). Prospad fundamentally differs from an Apriori-like candidate set generation-and-test approach. This approach is a pattern growth-based approach without candidate generation. It grows longer patterns from shorter ones by successively projecting the sequential table into subsequential tables. Since a projected table for a sequential pattern i contains all and only necessary information for mining the sequential patterns that can grow from i, the size of the projected table usually reduces quickly as mining proceeds to longer patterns. Moreover, avoiding creating and dropping cost of some temporary tables, depth first approach is used to facilitate the projecting process.
机译:数据挖掘的数据库集成已经获得了普及,其意义得到了很好的认可。然而,已知SQL基于SQL的数据挖掘的性能来落后于专业实施,因为与提取知识相关的成本以及缺乏合适的声明性查询语言支持的成本的禁止性。最近的研究发现,对于关联规则挖掘和顺序模式挖掘,具有仔细调整的SQL配方,可以实现与缓存DBMS之外的文件中的数据相当的性能。然而,大多数以前的模式挖掘方法遵循APRiori的方法,当顺序数据库很大和/或待开采的顺序模式时,仍然遇到问题。在本文中,我们介绍了一种基于SQL基于SQL的方法,即我们最近提出的,称为PROSPAD(投影顺序模式发现)。 PROSPAD从根本上不同于Apriori的候选集合生成和测试方法。这种方法是没有候选生成的基于模式生长的方法。通过连续将顺序表突出到后续表中,它从较短的模式中从较短的模式增长了更长的模式。由于用于顺序模式的预定表I包含用于挖掘可以从I生长的顺序模式的所有和唯一必要的信息,因此投影表的大小通常随着挖掘进入更长的模式而快速减少。此外,避免了一些临时表的创造和丢弃成本,深度首先方法用于促进投影过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号