首页> 外文期刊>Computational statistics & data analysis >Nearest-neighbor classification with categorical variables
【24h】

Nearest-neighbor classification with categorical variables

机译:带分类变量的最近邻分类

获取原文
获取原文并翻译 | 示例
           

摘要

A technique is presented for adopting nearest-neighbor classification to the case of categorical variables. The set of categories is mapped onto the real line in such a way as to maximize the ratio of total sum of squares to within-class sum of squares, aggregated over classes. The resulting real values then replace the categories, and nearest-neighbor classification proceeds with the Euclidean metric on these new values. Continuous variables can be included in this scheme with little added efort. This approach has been implemented in a computer program and tried on a number of data sets, with encouraging results. Nearest-neighbor classification is a well-known and efective classification technique. With this scheme, an unknown item's distances to all known items are measured, and the unknown class is estimated by the class of the nearest neighbor or by the class most often represented among a set of nearest neighbors. This has proven effective in many examples, but an appropriate distance normalization is required when variables are scaled differently. For categorical variables "distance" is not even defined. In this paper categorical data values are replaced by real numbers in an optimal way: then those real numbers are used in nearest-neighbor classification.
机译:提出了一种在分类变量情况下采用最近邻分类的技术。类别集合以最大化平方总和与类内平方和之比的方式映射到实线上,该总和随类累加。然后,所得的实际值将替换类别,并且在这些新值上使用欧几里得度量进行最近邻分类。连续变量可以包含在此方案中,而无需付出太多努力。该方法已在计算机程序中实现,并尝试了许多数据集,并获得了令人鼓舞的结果。最近邻分类是一种众所周知的有效分类技术。使用此方案,可以测量未知物品到所有已知物品的距离,并通过最近邻居的类别或一组最近邻居中最经常表示的类别来估计未知类别。在许多示例中,这被证明是有效的,但是当变量的缩放比例不同时,需要适当的距离归一化。对于分类变量,甚至没有定义“距离”。在本文中,分类数据值以最佳方式被实数替换:然后将这些实数用于最近邻居分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号