首页> 外文会议>SIAM International Conference on Data Mining >CISpan: Comprehensive Incremental Mining Algorithms of Closed Sequential Patterns for Multi-Versional Software Mining
【24h】

CISpan: Comprehensive Incremental Mining Algorithms of Closed Sequential Patterns for Multi-Versional Software Mining

机译:CISPAN:多版软件挖掘综合顺序模式的综合增量挖掘算法

获取原文

摘要

Recently, frequent sequential pattern mining algorithms have been widely used in software engineering field to mine various source code or specification patterns. In practice, software evolves from one version to another in its life span. The effort of mining frequent sequential patterns across multiple versions of a software can be substantially reduced by efficient incremental mining. This problem is challenging in this domain since the databases are usually updated in all kinds of manners including insertion, various modifications as well as removal of sequences. Also, different mining tools may have various mining constraints, such as low minimum support. None of the existing work can be applied effectively due to various limitations of such work. For example, our recent work, IncSpan, failed solving the problem because it could neither handle low minimum support nor removal of sequences from database. In this paper, we propose a novel, comprehensive incremental mining algorithm for frequent sequential pattern, CISpan (Comprehensive Incremental Sequential Pattern mining). CISpan supports both closed and complete incremental frequent sequence mining, with all kinds of updates to the database. Compared to IncSpan, CISpan tolerates a wide range for minimum support threshold (as low as 2). Our performance study shows that in addition to handling more test cases on which IncSpan fails, CISpan outperforms IncSpan in all test cases which IncSpan could handle, including various sequence length, number of sequences, modification ratio, etc., with an average of 3.4 times speedup. We also tested CISpan's performance on databases transformed from 20 consecutive versions of Linux Kernel source code. On average, CISpan outperforms the non-incremental CloSpan by 42 times.
机译:最近,频繁的顺序模式挖掘算法已广泛用于软件工程领域以挖掘各种源代码或规范模式。在实践中,软件在其寿命中从一个版本发展到另一个版本。通过有效的增量挖掘,可以大大降低跨多个版本的频繁顺序模式的努力。此问题在该领域具有具有挑战性,因为数据库通常以各种方式更新,包括插入,各种修改以及序列的删除。而且,不同的采矿工具可以具有各种挖掘约束,例如低最小支持。由于此类工作的各种局限性,任何现有的工作都没有有效地应用。例如,我们最近的工作IncSPan,解决了问题,因为它既不能处理低最小支持,也不会从数据库中删除序列。本文提出了一种新颖,综合增量采矿算法,频繁顺序模式,CISPAN(综合增量顺序挖掘)。 CISPAN支持关闭和完整的增量频繁序列挖掘,以及数据库的各种更新。与INCSPAN相比,CISPAN容忍最小支撑阈值的宽范围(低至2)。我们的绩效研究表明,除了处理INCSCUN失败的更多测试用例之外,CISPAN优于所有测试用例,其中INCSPan可以处理,包括各种序列长度,序列数,修改率等,平均为3.4倍加速。我们还测试了Cispan在从20个连续版本的Linux内核源代码转换的数据库上的性能。平均而言,CISPAN优于非增量的TIMPPAN 42次。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号