首页> 外文期刊>Knowledge-Based Systems >MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream
【24h】

MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream

机译:MiFI-Outlier:针对不确定数据流的基于偶项集的极少频率异常检测方法

获取原文
获取原文并翻译 | 示例
           

摘要

Massive outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing outlier detection approaches were not suitable for uncertain data stream environment. In addition, many outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected outliers not coincide with the definition of outlier. Itemset-based outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based outlier detection approach called MiFI-Outlier is proposed to effectively detect the outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFIUDSM is proposed to mine the minimal infrequent itemsets (Mins) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of "item cap" and "support cap". In outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and outlier detection phase. (C) 2019 Elsevier B.V. All rights reserved.
机译:在过去的二十年中,已经提出了针对静态数据集的大规模离群值检测方法,并取得了良好的成绩。在现实生活中,不确定的数据流越来越普遍,但是大多数现有的异常检测方法都不适合不确定的数据流环境。另外,许多离群值检测方法没有考虑每个元素的出现频率,这导致检测到的离群值与离群值的定义不一致。基于项集的离群值检测方法为该问题提供了很好的解决方案,并且近年来受到了越来越多的关注。本文提出了一种新的基于两步最小不频繁项集的离群值检测方法,称为MiFI-Outlier,可以有效地从不确定数据流中检测离群值。在项目集挖掘阶段,提出了一种名为MiFIUDSM的基于矩阵的方法,用于从不确定数据流中挖掘最小不频繁项目集(Mins),然后提出了一种改进的方法,称为MiFI-UDSM *,可以使用以下方法更有效地挖掘这些最小不频繁项目集。 “项目上限”和“支持上限”的想法。在离群检测阶段,基于挖掘的MiFI,定义了三个偏差指数,包括最小不频繁项目集偏差指数(MiFIDI),相似性偏差指数(SDI)和交易偏差指数(TDI),以测量每个交易的偏差程度,然后MiFI异常值用于从不确定的数据流中识别异常值。在公共数据集和综合数据集上进行了一些实验研究,结果表明,该方法在不频繁项集挖掘阶段和离群值检测阶段的表现优于大数据集。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号