首页> 外文期刊>Annals Data Science >A New Approach for Improving Classification Accuracy in Predictive Discriminant Analysis
【24h】

A New Approach for Improving Classification Accuracy in Predictive Discriminant Analysis

机译:预测判别分析中提高分类精度的新方法

获取原文
获取原文并翻译 | 示例
           

摘要

The focus of a predictive discriminant analysis is to improve classification accuracy, and to obtain statistically optimal classification accuracy or hit rate is still a challenge due to the inherent variability of most real life dataset. Improving classification accuracy is usually achieved with best subset of relevant predictors obtained by using classical variable selection methods. The goal of variable selection methods is to choose the best subset (or training sample) of relevant variables that typically reduces the complexity of a model and makes it easier to interpret, improves the classification accuracy of the model and reduces the training time. However, a statistically optimal hit rate can be achieved if the training sample meets a near optimal condition by resolving any significant differences in the variances for the groups formed by the dependent variable. This paper proposes a new approach for obtaining a near optimal training sample that will produce a statistically optimal hit rate using a modified winsorization with graphical diagnostic. In application to real life data sets, the proposed new approach was able to identify and remove legitimate contaminants in one or more predictors in the training sample, thereby resolving any significant differences in the variances for the groups formed by the dependent variable. The graphical diagnostic associated with the new approach, however, provides a useful visual tool which served as an alternative graphical test for homogeneity of variances.
机译:预测判别分析的重点是提高分类准确性,由于大多数现实生活数据集的固有可变性,获得统计上最佳的分类准确性或命中率仍然是一个挑战。通常,通过使用经典变量选择方法获得的相关预测变量的最佳子集,可以提高分类的准确性。变量选择方法的目标是选择相关变量的最佳子集(或训练样本),这通常会降低模型的复杂性并使其更易于解释,提高模型的分类准确性并减少训练时间。但是,如果训练样本通过解决因变量形成的组的方差中的任何显着差异,则满足训练条件满足最佳条件时,就可以实现统计学上的最佳命中率。本文提出了一种新方法,该方法用于获取接近最优的训练样本,该样本将使用带有图形诊断的改进的Winsorization产生统计学上最优的命中率。在应用于现实生活数据集时,所提出的新方法能够识别并去除训练样本中一个或多个预测变量中的合法污染物,从而解决因变量形成的组方差的任何重大差异。但是,与新方法相关的图形诊断提供了一种有用的可视化工具,可以用作方差均匀性的替代图形测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号