首页> 外文期刊>Journal of Chemometrics >Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data
【24h】

Correlation-assisted nearest shrunken centroid classifier with applications for high dimensional spectral data

机译:相关辅助的最近收缩质心分类器及其在高维光谱数据中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large-p-small-n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation-assisted nearest shrunken centroid classifier (CA-NSC) to incorporate correlation information into the classification. CA-NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA-NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA-NSC consistently improves on NSC and SIMCA. The misclassification rate of CA-NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification. Copyright (C) 2015 John Wiley & Sons, Ltd.
机译:在当代化学研究中经常观察到高通量数据。通过光谱信息进行分类是化学计量学中的重要问题。线性判别分析(LDA)在大-小-小-n情况下失败的主要原因有两个:(1)当p> n时样本协方差矩阵是奇异的;(2)在估计的p时存在噪声累积高维特征空间中的类质心。独立规则是一类用于通过忽略频谱变量之间的相关信息来克服这些缺点的方法。但是,强相关性是光谱数据的基本特征。我们提出了一种新的相关辅助最近收缩质心分类器(CA-NSC),以将相关信息纳入分类。 CA-NSC结合了两种信息来源[类质心(均值)和相关结构(方差)]以生成分类。我们使用两次真实数据分析和一次模拟研究来验证我们的CA-NSC方法。除了NSC,我们还与类比的软独立建模(SIMCA)方法进行了比较,该方法仅使用相关结构信息进行分类。结果表明,CA-NSC在NSC和SIMCA上持续改进。在一项实际数据分析中,与NSC相比,CA-NSC的错误分类率降低了近一半。通常,即使类质心中包含的歧视性信息保持不变,变量之间的相关性也会恶化NSC的性能。如果仅使用相关结构信息(如SIMCA的情况),则仅当相关结构本身可以提供足够的分类信息时,结果才会令人满意。版权所有(C)2015 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号