【24h】

Hubness-Aware Shared Neighbor Distances for High-Dimensional κ-Nearest Neighbor Classification

机译:高维κ最近邻分类的中心度感知共享邻居距离

获取原文

摘要

Learning from high-dimensional data is usually quite a challenging task, as captured by the well known phrase curse of dimensionality. Most distance-based methods become impaired due to the distance concentration of many widely used metrics in high-dimensional spaces. One recently proposed approach suggests that using secondary distances based on the number of shared κ-nearest neighbors between different points might partly resolve the concentration issue, thereby improving overall performance. Nevertheless, the curse of dimensionality also affects the κ-nearest neighbor inference in severely negative ways, one such consequence being known as hubness. The impact of hubness on forming shared neighbor distances has not been discussed before and it is what we focus on in this paper. Furthermore, we propose a new method for calculating the secondary distances which is aware of the underlying neighbor occurrence distribution. Our experiments suggest that this new approach achieves consistently superior performance on all considered high-dimensional data sets. An additional benefit is that it essentially requires no extra computations compared to the original methods.
机译:从高维数据中学习通常是一项艰巨的任务,正如众所周知的维数短语诅咒所捕获的那样。由于许多广泛使用的度量在高维空间中的距离集中,因此大多数基于距离的方法都会受到损害。最近提出的一种方法建议,使用基于不同点之间共享的κ最近邻居的数量的辅助距离可以部分解决集中问题,从而改善总体性能。但是,维数的诅咒也会以严重的负面影响方式影响κ最近邻居的推断,这种结果之一就是所谓的“中心性”。以前没有讨论过中心度对形成共享邻居距离的影响,这是我们在本文中关注的重点。此外,我们提出了一种计算次要距离的新方法,该方法可以了解潜在的邻居发生分布。我们的实验表明,这种新方法在所有考虑的高维数据集上均能始终如一地实现卓越性能。另一个好处是,与原始方法相比,它基本上不需要额外的计算。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号