首页> 外文期刊>Data & Knowledge Engineering >Efficient Algorithms For Incremental Maintenance Of Closed Sequential Patterns In Large Databases
【24h】

Efficient Algorithms For Incremental Maintenance Of Closed Sequential Patterns In Large Databases

机译:大型数据库中封闭顺序模式增量维护的高效算法

获取原文
获取原文并翻译 | 示例
       

摘要

Recent study shows that mining compact frequent patterns (such as closed patterns and compressed patterns) can alleviate the interpretability and efficiency problem encountered by traditional frequent pattern mining methods. Compact frequent patterns keep exact or approximate supports of a complete set of frequent patterns, and the number of them is often orders of magnitude smaller. Several efficient algorithms have been proposed to mine compact sequential patterns. However, sequence databases are not always static. Sequences (or items) are often added to and deleted from databases. A slight change made on a database may lead to the change of compact patterns. Mining from scratch is very time-consuming and thus infeasible. In this paper, we explore how to efficiently maintain closed sequential patterns in a dynamic sequence database environment. A compact structure CSTree is designed to keep closed sequential patterns, and its nice properties are carefully studied. Two efficient algorithms, IMCS_A and IMCS_D, are developed to maintain the CSTree upon incremental update. The algorithms make full use of the properties of CSTree to find nodes whose states are obsolete and avoid unnecessary node extension and closure checking operations to accelerate the incremental update process. A thorough experimental study on various real and synthetic datasets shows that the proposed algorithms outperform the state-of-the-art algorithms - PrefixSpan, CloSpan, BIDE and a recently proposed incremental mining algorithm IncSpan by about a factor of 4 to more than an order of magnitude.
机译:最近的研究表明,挖掘紧凑的频繁模式(例如闭合模式和压缩模式)可以缓解传统频繁模式挖掘方法遇到的可解释性和效率问题。紧凑的频繁模式保留了完整的频繁模式集的精确或近似支持,并且它们的数量通常小几个数量级。已经提出了几种有效的算法来挖掘紧凑的顺序模式。但是,序列数据库并不总是静态的。序列(或项目)通常添加到数据库或从数据库中删除。对数据库进行的细微更改可能会导致压缩模式的更改。从头开始开采非常耗时,因此不可行。在本文中,我们探索了如何在动态序列数据库环境中有效地维护封闭的顺序模式。紧凑的结构CSTree旨在保持闭合的连续模式,并仔细研究了其良好的属性。开发了两种有效的算法IMCS_A和IMCS_D,以在增量更新时维护CSTree。该算法充分利用CSTree的属性来查找状态已过时的节点,并避免了不必要的节点扩展和关闭检查操作以加速增量更新过程。对各种真实和合成数据集进行的全面实验研究表明,所提出的算法优于最新算法-PrefixSpan,CloSpan,BIDE和最近提出的增量挖掘算法IncSpan大约是一个数量级的4倍。数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号