首页> 外文期刊>Expert Systems with Application >C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data
【24h】

C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data

机译:C3Ro:噪声数据中扩展-闭合的连续鲁棒顺序模式的有效挖掘算法

获取原文
获取原文并翻译 | 示例
           

摘要

Sequential pattern mining has been the focus of many works, but still faces a tough challenge in the mining of large databases for both efficiency and apprehensibility of its resulting set. To overcome these issues, the most promising direction taken by the literature relies on the use of constraints, including the well-known closedness constraint. However, such a mining is not resistant to noise in data, a characteristic of most real-world data. The main research question raised in this paper is thus: how to efficiently mine an apprehensible set of sequential patterns from noisy data?In order to address this research question, we introduce 1) two original constraints designed for the mining of noisy data: the robustness and the extended-closedness constraints, 2) a generic pattern mining algorithm, C3Ro, designed to mine a wide range of sequential patterns, going from closed or maximal contiguous sequential patterns to closed or maximal regular sequential patterns. C3Ro is dedicated to practitioners and is able to manage their multiple constraints. C3Ro also is the first sequential pattern mining algorithm to be as generic and parameterizable. Extensive experiments have been conducted and reveal the high efficiency of C3Ro, especially in large datasets, over well-known algorithms from the literature. Additional experiments have been conducted on a real-world job offers noisy dataset, with the goal to mine activities. This experiment offers a more thorough insight into C3Ro algorithm: job market experts confirm that the constraints we introduced actually have a significant positive impact on the apprehensibility of the set of mined activities. (C) 2019 Elsevier Ltd. All rights reserved.
机译:顺序模式挖掘一直是许多工作的重点,但是在大型数据库的挖掘中,其结果集的效率和可理解性仍然面临着严峻的挑战。为了克服这些问题,文献中最有前途的方向依赖于约束的使用,包括众所周知的封闭性约束。但是,这种挖掘无法抵抗数据中的噪声,这是大多数现实世界数据的特征。因此,本文提出的主要研究问题是:如何从噪声数据中有效地挖掘出一组可理解的顺序模式?为了解决该研究问题,我们引入1)设计用于噪声数据挖掘的两个原始约束:鲁棒性以及扩展的封闭性约束; 2)通用模式挖掘算法C3Ro,用于挖掘各种顺序模式,从封闭或最大连续序列模式到封闭或最大规则序列模式。 C3Ro致力于从业人员,并能够管理他们的多重约束。 C3Ro也是第一个通用且可参数化的顺序模式挖掘算法。已经进行了广泛的实验,并通过文献中的著名算法揭示了C3Ro的高效率,特别是在大型数据集中。在现实世界中提供噪声数据集的作业上进行了其他实验,目的是挖掘活动。该实验提供了对C3Ro算法的更全面的了解:就业市场专家证实,我们引入的约束条件实际上对一系列采矿活动的可理解性具有显着的积极影响。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号