A permutation test approach to the choice of size k for the nearest neighbors classifier

Yinglei Lai; Baolin Wu; Hongyu Zhao

首页> 外文期刊>Journal of applied statistics >A permutation test approach to the choice of size k for the nearest neighbors classifier

【24h】

A permutation test approach to the choice of size k for the nearest neighbors classifier

机译：用于最近邻居分类器选择大小k的置换测试方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Department of Statistics and Biostatistics Center, The George Washington University, 2140 Pennsylvania Avenue, N.W., Washington, DC 20052, USA;Division of Biostatistics, School of Public Health,University of Minnesota, A442 Mayo Building, MMC 303, 420 Delaware St SE, Minneapolis,MN 55455, USA;Department of Epidemiology and Public Health, Yale University School of Medicine,New Haven, CT 06520, USA;%The k nearest neighbors (i-NN) classifier is one of the most popular methods for statistical pattern recognition and machine learning. In practice, the size k, the number of neighbors used for classification, is usually arbitrarily set to one or some other small numbers, or based on the cross-validation procedure. In this study, we propose a novel alternative approach to decide the size k. Based on a k-NN-based multivariate multi-sample test, we assign each k a permutation test based Z-score. The number of NN is set to the k with the highest Z-score. This approach is computationally efficient since we have derived the formulas for the mean and variance of the test statistic under permutation distribution for multiple sample groups. Several simulation and real-world data sets are analyzed to investigate the performance of our approach. The usefulness of our approach is demonstrated through the evaluation of prediction accuracies using Z-score as a criterion to select the size k. We also compare our approach to the widely used cross-validation approaches. The results show that the size k selected by our approach yields high prediction accuracies when informative features are used for classification, whereas the cross-validation approach may fail in some cases.

机译：美国华盛顿特区，西北，宾夕法尼亚大街2140号，乔治华盛顿大学统计与生物统计中心，20052；美国明尼苏达大学公共卫生学院，生物统计部门，梅拉大厦A442，MMC 303，特拉华州东南SE 420，美国明尼阿波利斯市（MN 55455）；耶鲁大学医学院流行病学与公共卫生系，美国纽黑文（CT）06520;％k最近邻（i-NN）分类器是最流行的统计模式识别方法之一和机器学习。实际上，大小k（用于分类的邻居数）通常被任意设置为一个或一些其他小数，或者基于交叉验证过程。在这项研究中，我们提出了一种新颖的替代方法来确定大小k。基于基于k-NN的多元多样本检验，我们为每个k分配基于Z评分的置换检验。 NN的数量设置为Z得分最高的k。由于我们已经推导了多个样本组在置换分布下的测试统计量的均值和方差的公式，因此该方法具有较高的计算效率。分析了一些模拟和真实数据集，以研究我们方法的性能。通过使用Z分数作为选择大小k的准则对预测准确性进行评估，证明了我们方法的有效性。我们还将比较我们的方法和广泛使用的交叉验证方法。结果表明，当使用信息特征进行分类时，我们的方法选择的大小k会产生较高的预测精度，而交叉验证方法在某些情况下可能会失败。

著录项

来源
《Journal of applied statistics》 |2011年第10期|p.2289-2302|共14页
作者
Yinglei Lai; Baolin Wu; Hongyu Zhao;
展开▼
作者单位

Department of Statistics and Biostatistics Center, The George Washington University, 2140 Pennsylvania Avenue, N.W., Washington, DC 20052, USA;

Division of Biostatistics, School of Public Health,University of Minnesota, A442 Mayo Building, MMC 303, 420 Delaware St SE, Minneapolis,MN 55455, USA;

Department of Epidemiology and Public Health, Yale University School of Medicine,New Haven, CT 06520, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
nearest neighbors classifier; number of neighbors; permutation test; prediction accuracy; cross-validation;

机译：最近邻居分类器;邻居数;排列测试;预测精度;交叉验证;

相似文献

外文文献
中文文献
专利

1. Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review [J] . Haneen Arafat Abu Alfeilat, Ahmad B.A. Hassanat, Omar Lasassmeh, Big Data . 2019,第4期

机译：距离度量选择对K最近邻分类器性能的影响：综述
2. Contributions of MIR to soundscape ecology. Part 3: Tagging and classifying audio features using a multi-labeling k-nearest neighbor approach [J] . Bellisario Kristen M., Broadhead Taylor, Savage David, Ecological informatics: an international journal on ecoinformatics and computational ecology . 2019,第期

机译：MIR对Soundscape生态学的贡献。第3部分：使用多标签k最近邻近的方法标记和分类音频功能
3. A Hybrid Approach for Blur Detection Using Na?ve Bayes Nearest Neighbor Classifier [J] . Harjot Kaur, Mandeep Kaur International Journal of Information Technology and Computer Science . 2016,第12期

机译：使用朴素贝叶斯最近邻分类器进行模糊检测的混合方法
4. Choice of dimensionality reduction methods for feature and classifier fusion with nearest neighbor classifiers [C] . Deegalla Sampath, Bostrom Henrik, Walgama Keerthi International Conference on Information Fusion;FUSION 2012 . 2012

机译：特征和分类器与最近邻分类器融合的降维方法选择
5. The Effect of K-Nearest Neighbors Classifier for Intrusion Detection of Streaming Net-Flows in Apache Spark Environment [D] . Thevar, Muthukumar 2017

机译：K近邻分类器对Apache Spark环境中流式网络流的入侵检测的影响
6. Dressing Tool Condition Monitoring through Impedance-Based Sensors: Part 2—Neural Networks and K-Nearest Neighbor Classifier Approach [O] . Pedro Junior, Doriana M. D’Addona, Paulo Aguiar, 2018

机译：通过基于阻抗的传感器进行修整工具状态监视：第2部分-神经网络和K最近邻分类器方法
7. Dressing Tool Condition Monitoring through Impedance-Based Sensors: Part 2—Neural Networks and K-Nearest Neighbor Classifier Approach [O] . Pedro Junior, Doriana D’Addona, Paulo Aguiar, 2018

机译：通过阻抗的传感器进行梳妆台状态监测：第2部分 - 神经网络和K最近邻分类器方法
8. Nearest Neighbor - a New Non-Parametric Test Used for Classifying Spectral Data [R] . Chapman, W. E., Nadeau, J. J., Switzer, P. 1968

机译：最近邻 - 一种用于光谱数据分类的非参数检验

A permutation test approach to the choice of size k for the nearest neighbors classifier

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅