...
首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >Comparison of the performance of multiclass classifiers in chemical data: Addressing the problem of overfitting with the permutation test
【24h】

Comparison of the performance of multiclass classifiers in chemical data: Addressing the problem of overfitting with the permutation test

机译:化学数据中的多字母分类器性能的比较:解决置换测试的过度问题问题

获取原文
获取原文并翻译 | 示例
           

摘要

The objective of this work was to apply different pattern recognition techniques in datasets-i.e., the Glass Identification Dataset and the Wine Quality Dataset-commonly used as a chemometric study of cases. In this paper, three types of different classification models were used. The first type was based on discriminant analysis and other linear classification models such as Linear Discriminant Analysis (LDA), Regularized Discriminant Analysis (RDA), Mixture Discriminant Analysis (MDA), and Partial Least Squares Discriminant Analysis (PLS-DA). The second type was based on nonlinear classification models such as Artificial Neural Networks (ANN), Support Vector Machine (SVM) with a radial kernel function, k-Nearest Neighbors (k-NN), Naive Bayes (NB), and Learning Vector Quantization (LVQ). The last type was based on classification trees and rule-based models such as Classification and Regression Tree (CART), Bagging, Random Forest (RF), C5.0, and Generalized Boosted Machine (GBM). The obtained results outperformed the classification concerning works previously published in the literature. The computational experiments show that the LVQ was the one method able to classify all three datasets correctly. The permutation tests were applied to evaluate the occurrences of the overfitting problem. The results showed that the overfitting problem was absent, which was confirmed by the pairwise Wilcoxon signed-rank test.
机译:这项工作的目的是在数据集-1.E中应用不同的模式识别技术。,玻璃识别数据集和葡萄酒质量数据集 - 常用为对病例的化学计量研究。在本文中,使用了三种类型的不同分类模型。第一类基于判别分析和其他线性分类模型,例如线性判别分析(LDA),正则判别分析(RDA),混合判别分析(MDA)和局部最小二乘判别分析(PLS-DA)。第二种类型基于非线性分类模型,例如人工神经网络(ANN),支持向量机(SVM),带有径向内核函数,K-CORMONT邻居(K-NN),幼稚贝叶斯(NB)和学习矢量量化(LVQ)。最后一次类型基于分类树和基于规则的模型,如分类和回归树(推车),袋装,随机森林(RF),C5.0和广义提升机(GBM)。获得的结果优于先前在文献中发表的作品的分类。计算实验表明LVQ是能够正确分类所有三个数据集的方法。应用置换测试来评估过度拟合问题的发生。结果表明,不存在过烧点的问题,其通过成对毒素签名 - 秩检验证实。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号