首页> 外文期刊>The Analyst: The Analytical Journal of the Royal Society of Chemistry: A Monthly International Publication Dealing with All Branches of Analytical Chemistry >The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis
【24h】

The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis

机译:模型自适应空间收缩(MASS)方法:一种基于模型总体分析的同时变量选择和离群值检测的新方法

获取原文
获取原文并翻译 | 示例
       

摘要

Variable selection and outlier detection are important processes in chemical modeling. Usually, they affect each other. Their performing orders also strongly affect the modeling results. Currently, many studies perform these processes separately and in different orders. In this study, we examined the interaction between outliers and variables and compared the modeling procedures performed with different orders of variable selection and outlier detection. Because the order of outlier detection and variable selection can affect the interpretation of the model, it is difficult to decide which order is preferable when the predictabilities (prediction error) of the different orders are relatively close. To address this problem, a simultaneous variable selection and outlier detection approach called Model Adaptive Space Shrinkage (MASS) was developed. This proposed approach is based on model population analysis (MPA). Through weighted binary matrix sampling (WBMS) from model space, a large number of partial least square (PLS) regression models were built, and the elite parts of the models were selected to statistically reassign the weight of each variable and sample. Then, the whole process was repeated until the weights of the variables and samples converged. Finally, MASS adaptively found a high performance model which consisted of the optimized variable subset and sample subset. The combination of these two subsets could be considered as the cleaned dataset used for chemical modeling. In the proposed approach, the problem of the order of variable selection and outlier detection is avoided. One near infrared spectroscopy (NIR) dataset and one quantitative structure-activity relationship (QSAR) dataset were used to test this approach. The result demonstrated that MASS is a useful method for data cleaning before building a predictive model.
机译:变量选择和离群值检测是化学建模中的重要过程。通常,它们相互影响。他们的执行顺序也极大地影响建模结果。当前,许多研究以不同的顺序分别执行这些过程。在这项研究中,我们检查了异常值和变量之间的相互作用,并比较了使用不同顺序的变量选择和异常值检测执行的建模过程。因为离群值检测和变量选择的顺序会影响模型的解释,所以当不同顺序的可预测性(预测误差)相对接近时,很难确定哪个顺序更可取。为了解决此问题,开发了一种同时变量选择和离群值检测方法,称为模型自适应空间收缩(MASS)。此提议的方法基于模型人口分析(MPA)。通过从模型空间进行加权二元矩阵抽样(WBMS),建立了大量的偏最小二乘(PLS)回归模型,并选择模型的主要部分以统计方式重新分配每个变量和样本的权重。然后,重复整个过程,直到变量和样本的权重收敛为止。最后,MASS自适应地找到了一个由优化变量子集和样本子集组成的高性能模型。这两个子集的组合可以视为用于化学建模的清洗数据集。在提出的方法中,避免了变量选择和离群值检测的顺序问题。一种近红外光谱(NIR)数据集和一个定量构效关系(QSAR)数据集用于测试该方法。结果表明,MASS是建立预测模型之前进行数据清理的有用方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号