首页> 外文期刊>International Journal of Computer Systems Science & Engineering >Robust local outlier detection with statistical parameter for big data
【24h】

Robust local outlier detection with statistical parameter for big data

机译:具有统计参数的鲁棒局部离群值检测可用于大数据

获取原文
获取原文并翻译 | 示例

摘要

With the rapid expansion of data scale, big data mining and analytics has attracted increasing attention. Outlier detection as an important task of data mining is widely used in many applications. However, conventional outlier detection methods begin to have difficulty handling large datasets. In addition, most existing outlier detection methods typically can only identify global outliers and are over sensitive to parameters variation. In this paper, we propose a novel method for robust local outlier detection with statistical parameter which incorporates the new ideas in dealing with big data. This method not only can effectively identify both global and local outliers but also is associated with only one statistical parameter. Furthermore, the sole parameter can be easily determined without relying on users' domain knowledge. Most importantly, the method is insensitive to parameter variation, which ensures its superiority in robustness. The properties of the method are investigated and the performance is experimentally verified using synthetic and publicly available datasets. The experiments demonstrate the accuracy and efficiency of the method in identifying local outliers. Moreover, the method is also proved more robust to parameter variation than the well-known local outlier detection method LOF and two other representative outlier detection methods, DB and DBSCAN. The results show that the proposed method has superiority in handling big data.
机译:随着数据规模的迅速扩大,大数据挖掘和分析已引起越来越多的关注。异常检测是数据挖掘的重要任务,已在许多应用中广泛使用。但是,常规的异常值检测方法开始难以处理大型数据集。另外,大多数现有的离群值检测方法通常只能识别全局离群值,并且对参数变化过于敏感。在本文中,我们提出了一种具有统计参数的鲁棒局部离群值检测的新方法,该方法结合了处理大数据的新思想。该方法不仅可以有效地识别全局和局部离群值,而且仅与一个统计参数关联。此外,可以在不依赖用户领域知识的情况下容易地确定唯一参数。最重要的是,该方法对参数变化不敏感,从而确保了其鲁棒性。使用合成的和公开可用的数据集,研究了该方法的性质,并通过实验验证了性能。实验证明了该方法在识别局部异常值方面的准确性和效率。此外,与已知的局部离群值检测方法LOF和其他两种代表性的离群值检测方法DB和DBSCAN相比,该方法在参数变化方面也更加可靠。结果表明,该方法在处理大数据方面具有优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号