首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
【24h】

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection

机译:无监督基于距离的离群值检测中的反向最近邻居

获取原文
获取原文并翻译 | 示例

摘要

Outlier detection in high-dimensional data presents various challenges resulting from the “curse of dimensionality.” A prevailing view is that distance concentration, i.e., the tendency of distances in high-dimensional data to become indiscernible, hinders the detection of outliers by making distance-based methods label all points as almost equally good outliers. In this paper, we provide evidence supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings. Furthermore, we show that high dimensionality can have a different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points’ reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as . We provide insight into how some points (antihubs) appear very infrequently in -NN lists of other points, and explain the connection between antihubs, outliers, and existing unsupervised outlier-detection methods. By evaluating the classic -NN method, the angle-based technique designed for high-dimensional data, the density-based local outlier factor and influenced outlierness methods, and antihub-based methods on various synthetic and real-world data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised outlier detection.
机译:高维数据中的异常检测提出了“维数诅咒”带来的各种挑战。一种普遍的观点是,距离集中(即高维数据中距离的趋势变得不可区分)通过使基于距离的方法将所有点标记为几乎相同的良好离群值而阻碍了离群值的检测。在本文中,我们通过证明基于距离的方法可以在高维环境中产生更多对比的离群值,从而提供了支持这种观点过于简单的观点的证据。此外,我们通过在无监督的异常值检测上下文中重新检查反向最近邻居的概念,表明高维可以产生不同的影响。即,最近发现,点的反向邻居计数的分布在高维度上变得偏斜,从而导致了称为的现象。我们提供有关某些点(反集线)在其他点的-NN列表中很少出现的见解,并解释反集线,离群值和现有无监督离群值检测方法之间的联系。通过评估经典的-NN方法,针对高维数据设计的基于角度的技术,基于密度的局部离群值因子和影响的离群值方法,以及在各种合成和真实数据集上基于反集线的方法,我们提供了新颖的深入了解反向邻居计数在无监督异常值检测中的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号