...
首页> 外文期刊>Journal of Chemometrics >Model selection for partial least squares calibration and implications for analysis of atmospheric organic aerosol samples with mid-infrared spectroscopy
【24h】

Model selection for partial least squares calibration and implications for analysis of atmospheric organic aerosol samples with mid-infrared spectroscopy

机译:偏最小二乘校准的模型选择及其对中红外光谱分析大气有机气溶胶样品的启示

获取原文
获取原文并翻译 | 示例

摘要

In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade-offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user-defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross-validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid-infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0-30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. (C) 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.
机译:在开发偏最小二乘校正模型时,选择用于构造其的潜在变量的数量以最小化模型偏差和模型方差仍然是一个挑战。存在一些用于合并这些折衷的度量标准,但是模型简化的成本和可能无法实现的预测误差的潜力难以预测。我们提出了一种度量标准,该度量标准是在增加其他潜在变量的情况下,针对不断增长的模型方差对降低的偏差进行惩罚。罚金的大小由用户定义的参数缩放,该参数定义为在选择传统最小RMSECV解决方案的简约模型时对交叉验证的均方根误差(RMSECV)的分数增加提供约束。我们使用238种实验室标准液和750种具有中红外光谱的复杂大气有机气溶胶混合物,评估了该方法对四个有机官能团的定量分析。此惩罚的参数变化表明,由于拟合不足而导致的预测误差的增加受到与模型训练和验证所用实验室标准相似的样本惩罚幅度的限制。通过排名差异的总和施加与RMSECV允许的0-30%的增加相对应的罚款合计导致选择一个模型,该模型将实际RMSECV提升至实验室标准的20%,但平均误差降低了85%以环境混合物的预测浓度。用实验室混合物开发的偏最小二乘模型可以为复杂的环境样本提供有用的预测,但可能会受益于防止过度拟合。 (C)2015作者。 John Wiley&Sons Ltd.出版的《化学计量学杂志》。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号