首页> 外文期刊>Knowledge-Based Systems >FastRCA-Seq: An efficient approach for extracting hierarchies of multilevel closed partially-ordered patterns
【24h】

FastRCA-Seq: An efficient approach for extracting hierarchies of multilevel closed partially-ordered patterns

机译:fastrca-seq:提取多级闭合部分有序模式的层次结构的有效方法

获取原文
获取原文并翻译 | 示例

摘要

Discovering concise representations of sequential patterns in sequential data is a well-established data mining task. Recently, Nica et al. have put forward an original approach RCA-Seq for directly extracting a hierarchy of multilevel closed partially-ordered patterns (MCPO-patterns) from a sequence database within the Relational Concept Analysis (RCA) framework. RCA-Seq has been applied successfully to small (similar to 1, 000 sequences) but interesting real hydro-ecological datasets. However, RCA-Seq only focuses on providing comprehensible results to the detriment of performance. To improve the performance of RCA-Seq, we propose a new approach FastRCA-Seq that stems from RCA-Seq, and whose contributions are beneficial for two fields: Formal Concept Analysis, namely the RCA extension, and sequential pattern mining. FastRCA-Seq spans two key steps: the exploration of sequential data based on RCA, and the extraction of MCPO-patterns by navigating the RCA result. Firstly, our approach introduces an effective RCA implementation based on bit-array representations, bitwise operations, parallel computing, and several new properties of RCA that may prevent expensive computations. In addition, we state the bottleneck of RCA. Secondly, FastRCA-Seq is a self-contained approach for directly and efficiently mining hierarchies of MCPO-patterns from sequential data. We assess FastRCA-Seq on various benchmark datasets, precisely Gazelle, Kosarak, and FIFA. The results show that FastRCA-Seq outperforms RCA-Seq in terms of execution time (in average similar to 169 times faster) and memory usage (in average with similar to 42% less) while preserving the benefits of interpretability and usability of results by stakeholders. (C) 2020 Elsevier B.V. All rights reserved.
机译:在顺序数据中发现顺序模式的简明表示是一个良好的数据挖掘任务。最近,Nica等人。提出了原始方法RCA-SEQ,用于从关系概念分析(RCA)框架内的序列数据库中直接提取多级闭合部分有序模式(MCPO-Templys)的层次结构。 RCA-SEQ已成功应用于小(类似于1,000个序列)但有趣的真实水力生态数据集。但是,RCA-SEQ仅侧重于为损害​​性能提供可理解的结果。为了提高RCA-SEQ的表现,我们提出了一种新的方法FASTRCA-SEQ,源于RCA-SEQ,其贡献对两个领域有益:正式概念分析,即RCA扩展和连续模式挖掘。 FastrCA-SEQ跨越两个关键步骤:通过导航RCA结果来探索基于RCA的顺序数据,以及MCPO模式的提取。首先,我们的方法基于比特阵列表示,位操作,并行计算以及RCA的几个新属性引入有效的RCA实现,该RCA可以防止昂贵计算。此外,我们说明了RCA的瓶颈。其次,FastrCA-SEQ是一种自包含的方法,可直接和有效地从顺序数据中挖掘MCPO模式的层次结构。我们在各种基准数据集中评估FastrCA-SEQ,精确才能瞪羚,Kosarak和FIFA。结果表明,FastrCA-SEQ在执行时间(平均值速度的平均值速度)和内存使用情况(平均相似的速度)占据了RCA-SEQ,同时保留了利益相关者的可解释性和可用性的益处。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号