【24h】

Nearest Neighbours without k

机译:没有k的最近邻居

获取原文

摘要

School of Computing and Mathematics University of Ulster at Jordanstown Northern Ireland, BT37 0QB, United Kingdom In data mining, the k-Nearest-Neighbours (kNN) method for classification is simple and effectiv. The success of kNN in classification is dependent on the selection of a "good" value for k, so in a sense kNN is biased by fc. However, it is unclear what is a universally good value for k. We propose to solve this choice-of-k issue by an alternative formalism which uses a sequence of values for k. Each value for k defines a neighbourhood for a data record - a set of k nearest neighbours, which contains some degree of support for each class with respect to the data record. It is our aim to select a set of neighbourhoods and aggregate their supports to create a classifier less biased by k. in print To this end we use a probability function G, which is defined in terms of a mass function for events weighted by a measurement of events. A mass function is an assignment of basic probability to events. In the case of classification, events can be interpreted as neighbourhoods, and the mass function can be interpreted in terms of class proportions in neighbourhoods. Therefore, a mass function represents degrees of support for a class in various neighbourhoods. We show that under this specification G is a linear function of the conditional probability of classes given a data record, which can be used directly for classification. Based on these findings we propose a new classification procedure. Experiment shows that this classification procedure is indeed less biased by k, and that it displays a saturating property as the number of neighbourhoods increases. Experiment further shows that the performance of our classification procedure at saturation is comparable to the best performance of kNN. Consequently, when we use kNN for classification we do not need to be concerned with k; instead, we need to select a set of neighbourhoods and apply the procedure presented here.
机译:北爱尔兰北爱尔兰北爱尔兰北爱尔兰的计算和数学大学学院,英国数据挖掘,k最近邻居(KNN)的分类方法简单而有效。 knn在分类中的成功取决于选择K的“良好”值,所以在感觉knn中被FC偏向。但是,尚不清楚k的普遍良好的价值是什么我们建议通过替代形式主义来解决这一选择的K问题,该替代形式主义使用k的一系列值。 k的每个值定义了数据记录的邻域 - 一组K最近邻居,其包含关于数据记录的每个类的一定程度的支持。我们的目标是选择一组邻域并聚合他们的支持,以创建k的偏置较少的分类器。在打印到此,我们使用概率函数g,其在由测量事件加权的事件的质量函数方面定义。质量函数是对事件的基本概率的分配。在分类的情况下,事件可以被解释为邻域,并且质量函数可以在邻域中的类比例中解释。因此,质量函数表示各个邻域中的类的支持程度。我们表明,根据该规范G是给定数据记录的条件概率的线性函数,可以直接用于分类。基于这些调查结果,我们提出了一种新的分类程序。实验表明,该分类过程确实较低,并且它显示出饱和性,随着邻域的数量增加。实验进一步表明,在饱和度下的分类程序的性能与KNN的最佳性能相当。因此,当我们使用KNN进行分类时,我们不需要关注k;相反,我们需要选择一组邻域并应用此处呈现的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号