首页> 外文会议>International Joint Conference on Neural Networks >The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces
【24h】

The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces

机译:集线器对高维空间中K-medoids聚类的不平衡作用

获取原文

摘要

Unbalanced cluster solutions are affected by very different cluster sizes, with some clusters being very large while others contain almost no data. We demonstrate that this phenomenon is connected to ‘hubness’, a recently discovered general problem of machine learning in high dimensional data spaces. Hub objects have a small distance to an exceptionally large number of data points, and anti-hubs are far from all other data points. In an empirical study of K-medoids clustering we show that hubness gives rise to very unbalanced cluster sizes resulting in impaired internal and external evaluation indices. We compare three methods which reduce hubness in the distance spaces and show that with the balancing of the clusters evaluation indices improve. This is done using artificial and real data sets from diverse domains.
机译:不平衡的群集解决方案受群集大小差异很大的影响,其中一些群集非常大,而另一些群集几乎不包含任何数据。我们证明了这种现象与“ hubness”有关,hubness是最近发现的在高维数据空间中机器学习的普遍问题。集线器对象到大量数据点的距离很小,而反集线器则与所有其他数据点相距很远。在对K-medoids聚类的实证研究中,我们发现,中心度会导致非常不平衡的聚类大小,从而导致内部和外部评估指标受损。我们比较了三种减少距离空间中的中心度的方法,结果表明,随着聚类的平衡,评估指标得以改善。这是使用来自不同领域的人工数据集和真实数据集完成的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号