首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >Effects of data pre-processing methods on classification of ATR-FTIR spectra of pen inks using partial least squares-discriminant analysis (PLS-DA)
【24h】

Effects of data pre-processing methods on classification of ATR-FTIR spectra of pen inks using partial least squares-discriminant analysis (PLS-DA)

机译:数据预处理方法对使用局部最小二乘判别分析(PLS-DA)的PEN油墨ATR-FTIR光谱分类的影响

获取原文
获取原文并翻译 | 示例
           

摘要

In response to our review paper [L.C. Lee et al., Chemom. Intell. Lab. Systs. 163 (2017) 64-75], we present a study that explores practical impacts of data preprocessing (DP) methods in ATR-FTTR spectra. Nine common DP methods, i.e. mean centering (MC), autoscaling (AS), Pareto scaling, robust scaling, multiplicative scatter correction (MSC), normalization to sum (NS), normalization to constant vector length (NV), standard normal variate and asymmetric least squares (AsLS), were chosen on the sake of their availability in the R software and the rather simple computation steps. An ATR-FTIR spectral dataset of blue gel pen inks that is originated from 10 different manufacturers (ie. brands) was used in this work. The dataset is colossal (N = 1361), high dimensional (J = 5401), multi-class (C = 10), and imbalanced. In order to examine the impacts of substrate interferences, the global spectral region was further divided, arbitrarily, into three mutually exclusive local regions and analyzed independently. Following that, the resulting four sub-datasets (i.e. one based on global and three based on local regions) were preprocessed via the DP methods independently to produce 40 different sub-datasets including the raw counterparts. Partial least squares-discriminant analysis (PLS-DA) was chosen to construct a series of 50 models by including the first 50 PLS components incrementally. The modeling was performed independently for each of the 40 sub-datasets. Each model was evaluated repeatedly using autoprediction, six variants of v-fold cross validation (v= 2, 4, 5, 7, 10, 15) and external testing schemes. As a results, empirical performances of each DP methods are represented by 400 different error rates (8 model validation schemes x 50 models). Performances of each DP method was then compared against its raw counterparts according to summary statistics and hypothesis tests. In addition, principal component analysis and hierarchical clustering analysis were also employed, respectively, to illustrate the spatial distribution and the similarity between the nine DP methods and the raw counterparts. Several important remarks have been drawn from the rigorous comparative analyses. First, due to the inherent properties of ATR-FTIR spectra, DP methods that handling slope, e.g. MSC and AsLS, have appeared to be the most excellent DP methods. Second, normalization methods, either NS or NV, ranked the second best-performing DP method. Third, MC shows no impact on the raw IR spectral dataset. Fourth, it is shown that outliers in the ATR-FTIR spectra of pen inks could be localized. Last but not least, removal of irrelevant signals arising from sample substrate is best achieved via region truncation rather than via PIS or DP methods alone.
机译:回复我们的评论文件[L.C. Lee等人。,化学。智能。实验室。 SYSTS。 163(2017)64-75],我们展示了一项研究,探讨了数据预处理(DP)方法在ATR-FTTR光谱中的实际影响。九种常见的DP方法,即平均定心(MC),自动缩放(AS),帕累托缩放,鲁棒缩放,乘法散射校正(MSC),归一化到总和(NS),常规向常数矢量长度(NV),标准正常变化和标准化选择不对称最小二乘(ASL),以获得R软件和相当简单的计算步骤的可用性选择。一个ATR-FTIR光谱数据集的蓝色凝胶笔墨,起源于10种不同的制造商(即品牌)。数据集是巨大的(n = 1361),高维(J = 5401),多类(C = 10),并且不平衡。为了检查基板干扰的影响,全局光谱区域进一步分为三个互斥的局部区域并独立分析。在此之后,通过DP方法独立地预处理了由此产生的四个子数据集(即,基于本地区域的三个基于本地区域的一个)以产生包括原始对应物的40个不同的子数据集。选择局部最小二乘判别分析(PLS-DA),通过逐步包括前50个PLS组分来构造一系列50型号。为40个子数据集中的每一个独立执行建模。每种模型都是使用自卸验证的六个变体重复评估每种模型(V = 2,4,5,7,10,15)和外部测试方案。结果,每个DP方法的经验性能由400种不同的误差速率表示(8个模型验证方案x 50型号)。然后根据总结统计和假设试验比较每种DP方法的性能与其原始对应物进行比较。此外,还分别使用了主成分分析和分层聚类分析,以说明九种DP方法和原始对应物之间的空间分布和相似性。已经从严格的比较分析中汲取了几个重要言论。首先,由于ATR-FTIR光谱的固有特性,处理斜率的DP方法,例如, MSC和ASLS,似乎是最优秀的DP方法。其次,正常化方法,NS或NV,排名第二最佳性DP方法。第三,MC对原始IR光谱数据集没有影响。第四,表明笔墨的ATR-FTIR光谱中的异常值可以是本地化的。最后但并非最不重要的是,通过仅通过区域截断而不是通过PIS或DP方法去除由样品基质产生的不相关信号。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号