首页> 外文期刊>Expert Systems with Application >Minimal infrequent pattern based approach for mining outliers in data streams
【24h】

Minimal infrequent pattern based approach for mining outliers in data streams

机译:基于最小频率的基于模式的方法来挖掘数据流中的异常值

获取原文
获取原文并翻译 | 示例

摘要

Outlier detection is an important task in data mining which aims at detecting patterns that are unusual in a dataset. Though several techniques are proved to be useful in solving some outlier detection problems, there are certain issues yet to be resolved. Most of the existing methods compute distance of points in full dimensional space to detect outliers. But in high dimensional space, the concept of proximity may not be qualitatively meaningful due to the curse of dimensionality and incurs high computational cost. Moreover, the existing methods focus on discovering outliers but do not provide the interpretability of different subspaces that cause the abnormality. Frequent pattern mining based approaches resolve the aforementioned issues. Recently, infrequent pattern mining has attracted the attention of data mining research community which aims at discovering rare associations and researches in this area motivated to propose a new method to detect outliers in data streams. Infrequent patterns are more interesting than frequent patterns in some domains such as fraudulent credit transactions, anomaly detection, etc. In such applications, mining infrequent patterns facilitates detecting outliers. Minimal infrequent patterns are generators of family of infrequent patterns. In this paper, a novel method is presented to detect outliers by mining minimal infrequent patterns from data streams. Three measures namely Transaction Weighting Factor (TWF), Minimal Infrequent Deviation Factor (MIPDF) and Minimal Infrequent Pattern based Outlier Factor (MIFPOF) are defined. An algorithm called Minimal Infrequent Pattern based Outlier Detection (MIFPOD) method is proposed for detecting outliers in data streams based on mined minimal infrequent patterns. The effectiveness of the proposed method is demonstrated on synthetic dataset obtained from vital dataset collected from body sensors and a publicly available real dataset. The experimental results have shown that the proposed method outperforms the existing methods in detecting outliers. (C) 2014 Elsevier Ltd. All rights reserved.
机译:离群检测是数据挖掘中的重要任务,其目的是检测数据集中异常的模式。尽管已证明有几种技术可用于解决某些异常检测问题,但仍有一些问题尚待解决。现有的大多数方法都会计算整个维空间中的点的距离,以检测异常值。但是在高维空间中,由于维数的诅咒,接近的概念可能在质量上没有意义,并且会导致较高的计算成本。而且,现有的方法着重于发现异常值,但是不提供引起异常的不同子空间的可解释性。基于频繁模式挖掘的方法解决了上述问题。最近,不频繁的模式挖掘引起了数据挖掘研究界的关注,该领域旨在发现该领域中的稀有关联和研究,从而提出一种检测数据流中异常值的新方法。在某些领域(例如欺诈性信用交易,异常检测等)中,不经常使用的模式比经常使用的模式更有趣。在此类应用中,挖掘不经常使用的模式有助于检测异常值。最小的不频繁模式是不频繁模式族的生成器。在本文中,提出了一种通过从数据流中挖掘最小的不频繁模式来检测异常值的新方法。定义了三种度量,即交易加权因子(TWF),最小不频繁偏差因子(MIPDF)和最小不频繁基于模式的离群因子(MIFPOF)。提出了一种基于最小不频繁模式的离群值检测(MIFPOD)算法,该算法基于挖掘的最小不频繁模式来检测数据流中的离群值。在从人体传感器收集的生命数据集和可公开获得的真实数据集获得的合成数据集上证明了该方法的有效性。实验结果表明,该方法在检测异常值方面优于现有方法。 (C)2014 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号