...
首页> 外文期刊>Data Mining and Knowledge Discovery >Fast incremental mining of web sequential patterns with PLWAP tree
【24h】

Fast incremental mining of web sequential patterns with PLWAP tree

机译:使用PLWAP树快速增量挖掘Web顺序模式

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Point and click at web pages generate continuous data sequences, which flow into the web log data, causing the need to update previously mined web sequential patterns. Algorithms for mining web sequential patterns from scratch include WAP, PLWAP and Apriori-based GSP. Reusing old patterns with only recent additional data sequences in an incremental fashion, when updating patterns, would achieve fast response time with reasonable memory space usage. This paper proposes two algorithms, RePL4UP (Revised PLWAP For UPdate), and PL4UP (PLWAP For UPdate), which use the PLWAP tree structure to incrementally update web sequential patterns efficiently without scanning the whole database even when previous small items become frequent. The RePL4UP concisely stores the position codes of small items in the database sequences in its metadata during tree construction. During mining, RePL4UP scans only the new additional database sequences, revises the old PLWAP tree to restore information on previous small items that have become frequent, while it deletes previous frequent items that have become small using the small item position codes. PL4UP initially builds a bigger PLWAP tree that includes all sequences in the database using a tolerance support, t, that is lower than the regular minimum support, s. The position code features of the PLWAP tree are used to efficiently mine these trees to extract current frequent patterns when the database is updated. These approaches more quickly update old frequent patterns without the need to re-scan the entire updated database.
机译:指向并单击网页会生成连续的数据序列,这些数据序列会流入Web日志数据,从而需要更新以前挖掘的Web顺序模式。从头开始挖掘Web顺序模式的算法包括WAP,PLWAP和基于Apriori的GSP。在更新模式时,以增量方式仅以最近的其他数据序列重用旧模式,将在合理的内存空间使用情况下实现快速响应时间。本文提出了两种算法,即RePL4UP(用于更新的PLWAP修订版)和PL4UP(用于更新的PLWAP),它们使用PLWAP树结构来有效地增量更新Web序列模式,而无需扫描整个数据库,即使以前的小项目频繁出现也是如此。 RePL4UP在树构建过程中将元数据中的小项目的位置代码简洁地存储在其元数据中。在挖掘期间,RePL4UP仅扫描新的其他数据库序列,修改旧的PLWAP树以恢复有关以前已成为频繁的小项目的信息,而它会使用小项目位置代码删除先前变得很小的频繁项目。 PL4UP最初使用容差支持t(比常规的最小支持s s低)来构建更大的PLWAP树,包括数据库中的所有序列。当更新数据库时,PLWAP树的位置代码功能用于有效地挖掘这些树,以提取当前的频繁模式。这些方法可以更快地更新旧的频繁模式,而无需重新扫描整个更新的数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号