首页> 外文期刊>Data mining and knowledge discovery >On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
【24h】

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

机译:关于无监督异常值检测的评估:度量,数据集和实证研究

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.
机译:在数据挖掘研究中,无监督离群值检测算法的评估一直是一个挑战。关于不同标准离群值检测模型的优缺点以及参数选择对这些算法的影响知之甚少。具有地面真相注释的适当基准数据集的稀缺性严重影响了离群方法的评估。即使有标记的数据集可用,它们对于异常值检测任务的适用性通常也是未知的。此外,还没有完全理解常用评估方法的偏见。因此,难以确定新提出的异常值检测方法相对于已建立方法的改进程度。在本文中,我们针对为此目的准备的各种数据集,对无监督离群值检测的一组代表性的基于标准k最近邻的方法的性能进行了广泛的实验研究。基于离群值检测方法的整体性能,我们提供了数据集本身的特征,并讨论了它们作为离群值检测基准集的适用性。我们还研究了用于比较不同方法性能的最常用方法,并提出了更适合用于评估异常值检测结果的调整。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号