...
首页> 外文期刊>Analytica chimica acta >Using variable combination population analysis for variable selection in multivariate calibration
【24h】

Using variable combination population analysis for variable selection in multivariate calibration

机译:在多元校正中使用变量组合总体分析进行变量选择

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of 'survival of the fittest' from Darwin's natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm-partial least squares (GA-PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750. (C) 2015 Elsevier B.V. All rights reserved.
机译:变量(波长或特征)选择技术已成为分析具有大量变量和相对较少样本的数据集的关键步骤。在这项研究中,提出了一种新颖的变量选择策略,变量组合总体分析(VCPA)。该策略包括两个关键程序。首先,指数递减函数(EDF)是达尔文自然进化理论中“适者生存”的简单有效原理,用于确定要保持并不断缩小变量空间的变量数量。其次,在每个EDF运行中,使用二进制矩阵抽样(BMS)策略(赋予每个变量相同的机会进行选择并生成不同的变量组合)来生成子集总数,以构建子模型总数。然后,采用模型总体分析(MPA)来查找具有较低交叉验证均方根误差(RMSECV)的变量子集。计算出现在最佳10%子模型中的每个变量的频率。频率越高,变量越重要。使用三个真实的NIR数据集研究了所提出程序的性能。结果表明,与四种高性能变量选择方法相比,VCPA是一种很好的变量选择策略:遗传算法-偏最小二乘(GA-PLS),PLS蒙特卡洛无信息变量消除(MC-UVE-PLS),竞争性自适应重新加权抽样(CARS),并反复保留信息变量(IRIV)。 VCPA的MATLAB源代码可在以下网站上进行学术研究:http://www.mathworks.com/matlabcentral/fileexchange/authors/498750。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号