...
首页> 外文期刊>The Analyst >Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data
【24h】

Interpolation and extrapolation problems of multivariate regression in analytical chemistry: benchmarking the robustness on near-infrared (NIR) spectroscopy data

机译:分析化学中多元回归的插值和外推问题:对近红外(NIR)光谱数据的稳健性进行基准测试

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Modern analytical chemistry of industrial products is in need of rapid, robust, and cheap analytical methods to continuously monitor product quality parameters. For this reason, spectroscopic methods are often used to control the quality of industrial products in an on-line/in-line regime. Vibrational spectroscopy, including mid-infrared (MIR), Raman, and near-infrared (NIR), is one of the best ways to obtain information about the chemical structures and the quality coefficients of multicomponent mixtures. Together with chemometric algorithms and multivariate data analysis (MDA) methods, which were especially created for the analysis of complicated, noisy, and overlapping signals, NIR spectroscopy shows great results in terms of its accuracy, including classical prediction error, RMSEP. However, it is unclear whether the combined NIR + MDA methods are capable of dealing with much more complex interpolation or extrapolation problems that are inevitably present in real-world applications. In the current study, we try to make a rather general comparison of linear, such as partial least squares or projection to latent structures (PLS); “quasi-nonlinear”, such as the polynomial version of PLS (Poly-PLS); and intrinsically non-linear, such as artificial neural networks (ANNs), support vector regression (SVR), and least-squares support vector machines (LS-SVM/LSSVM), regression methods in terms of their robustness. As a measure of robustness, we will try to estimate their accuracy when solving interpolation and extrapolation problems. Petroleum and biofuel (biodiesel) systems were chosen as representative examples of real-world samples. Six very different chemical systems that differed in complexity, composition, structure, and properties were studied; these systems were gasoline, ethanol–gasoline biofuel, diesel fuel, aromatic solutions of petroleum macromolecules, petroleum resins in benzene, and biodiesel. Eighteen different sample sets were used in total. General conclusions are made about the applicability of ANN- and SVM-based regression tools in the modern analytical chemistry. The effectiveness of different multivariate algorithms is different when going from classical accuracy to robustness. Neural networks, which are capable of producing very accurate results with respect to classical RMSEP, are not able to solve interpolation problems or, especially, extrapolation problems. The chemometric methods that are based on the support vector machine (SVM) ideology are capable of solving both classical regression and interpolation/extrapolation tasks. Modern analytical chemistry of industrial products is in need of rapid, robust, and cheap analytical methods to continuously monitor product quality parameters. For this reason, spectroscopic methods are often used to control the quality of industrial products in an on-line/in-line regime. Vibrational spectroscopy, including mid-infrared (MIR), Raman, and near-infrared (NIR), is one of the best ways to obtain information about the chemical structures and the quality coefficients of multicomponent mixtures. Together with chemometric algorithms and multivariate data analysis (MDA) methods, which were especially created for the analysis of complicated, noisy, and overlapping signals, NIR spectroscopy shows great results in terms of its accuracy, including classical prediction error, RMSEP. However, it is unclear whether the combined NIR + MDA methods are capable of dealing with much more complex interpolation or extrapolation problems that are inevitably present in real-world applications. In the current study, we try to make a rather general comparison of linear, such as partial least squares or projection to latent structures (PLS); “quasi-nonlinear”, such as the polynomial version of PLS (Poly-PLS); and intrinsically non-linear, such as artificial neural networks (ANNs), support vector regression (SVR), and least-squares support vector machines (LS-SVM/LSSVM), regression methods in terms of their robustness. As a measure of robustness, we will try to estimate their accuracy when solving interpolation and extrapolation problems. Petroleum and biofuel (biodiesel) systems were chosen as representative examples of real-world samples. Six very different chemical systems that differed in complexity, composition, structure, and properties were studied; these systems were gasoline, ethanol-gasoline biofuel, diesel fuel, aromatic solutions of petroleum macromolecules, petroleum resins in benzene, and biodiesel. Eighteen different sample sets were used in total. General conclusions are made about the applicability of ANN- and SVM-based regression tools in the modern analytical chemistry. The effectiveness of different multivariate algorithms is different when going from classical accuracy to robustness. Neural networks, which are capable of producing very accurate results with respect to classical RMSEP, are not able to solve interpolation problems or, especially, extrapolation problems. The chemometric methods that are based on the support vector machine (SVM) ideology are capable of solving both classical regression and interpolation/extrapolation tasks.
机译:工业产品的现代分析化学需要快速,可靠且便宜的分析方法来连续监控产品质量参数。由于这个原因,光谱方法通常用于在线/在线方式下控制工业产品的质量。振动光谱法,包括中红外(MIR),拉曼和近红外(NIR),是获取有关多组分混合物化学结构和质量系数信息的最佳方法之一。近红外光谱法结合专门为分析复杂,嘈杂和重叠信号而创建的化学计量学算法和多元数据分析(MDA)方法,在准确性方面,包括经典预测误差RMSEP方面均显示出了极好的结果。但是,目前尚不清楚NIR + MDA组合方法是否能够处理现实应用中不可避免存在的更为复杂的内插或外推问题。在当前的研究中,我们试图对线性进行比较笼统的比较,例如部分最小二乘法或投影到潜伏结构(PLS)。 “准非线性”,例如PLS(Poly-PLS)的多项式版本;以及本质上是非线性的,例如人工神经网络(ANN),支持向量回归(SVR)和最小二乘支持向量机(LS-SVM / LSSVM),以及基于鲁棒性的回归方法。作为鲁棒性的一种度量,我们将尝试在解决插值和外推问题时估计其准确性。选择了石油和生物燃料(生物柴油)系统作为现实世界样本的代表性示例。研究了六个在复杂性,组成,结构和性质上完全不同的化学系统;这些系统包括汽油,乙醇-汽油生物燃料,柴油,石油大分子的芳烃溶液,苯中的石油树脂和生物柴油。总共使用了18个不同的样本集。关于基于ANN和SVM的回归工具在现代分析化学中的适用性,得出了一般性结论。从经典精度到鲁棒性,不同的多元算法的有效性是不同的。相对于经典RMSEP而言,能够产生非常准确的结果的神经网络无法解决插值问题,尤其是外推问题。基于支持向量机(SVM)意识形态的化学计量学方法能够解决经典回归和内插/外推任务。工业产品的现代分析化学需要快速,可靠且便宜的分析方法来连续监控产品质量参数。由于这个原因,光谱方法通常用于在线/在线方式下控制工业产品的质量。振动光谱法,包括中红外(MIR),拉曼和近红外(NIR),是获取有关多组分混合物化学结构和质量系数信息的最佳方法之一。近红外光谱法结合专门为分析复杂,嘈杂和重叠信号而创建的化学计量学算法和多元数据分析(MDA)方法,在准确性方面,包括经典预测误差RMSEP方面均显示出了极好的结果。但是,目前尚不清楚NIR + MDA组合方法是否能够处理现实应用中不可避免存在的更为复杂的内插或外推问题。在当前的研究中,我们试图对线性进行比较笼统的比较,例如部分最小二乘法或投影到潜伏结构(PLS)。 “准非线性”,例如PLS(Poly-PLS)的多项式版本;以及本质上是非线性的,例如人工神经网络(ANN),支持向量回归(SVR)和最小二乘支持向量机(LS-SVM / LSSVM),以及基于鲁棒性的回归方法。作为鲁棒性的一种度量,我们将尝试在解决插值和外推问题时估计其准确性。选择了石油和生物燃料(生物柴油)系统作为现实世界样本的代表性示例。研究了六个在复杂性,组成,结构和性质上完全不同的化学系统;这些系统是汽油,乙醇-汽油生物燃料,柴油燃料,石油大分子的芳烃溶液,苯中的石油树脂和生物柴油。总共使用了18个不同的样本集。关于基于ANN和SVM的回归工具在现代分析化学中的适用性,得出了一般性结论。从经典精度到鲁棒性,不同的多元算法的有效性是不同的。能够针对经典RMSEP产生非常准确的结果的神经网络无法解决插值问题,或者特别是,外推问题。基于支持向量机(SVM)意识形态的化学计量学方法能够解决经典回归和内插/外推任务。

著录项

  • 来源
    《The Analyst》 |2012年第7期|p.1604-1610|共7页
  • 作者单位

    . Department of Chemistry and Applied Biosciences,ETH Zurich, 8093 Zurich, Switzerland;

    1. Unimilk Joint Stock Co., 143421 Moscow region, Russian Federation;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号