首页> 外文期刊>Advances and Applications in Bioinformatics and Chemistry >Performance of PLS regression coefficients in selecting variables for each response of a multivariate PLS for omics-type data
【24h】

Performance of PLS regression coefficients in selecting variables for each response of a multivariate PLS for omics-type data

机译:在针对组学类型数据的多元PLS的每个响应选择变量时PLS回归系数的性能

获取原文
           

摘要

Abstract: Multivariate partial least square (PLS) regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics). In presence of multiple responses, it is of particular interest how to appropriately “dissect” the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection). In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC) analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coeffi cients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.
机译:摘要:多元偏最小二乘(PLS)回归允许通过同时考虑不同因素来对复杂的生物事件进行建模。它不受数据共线性的影响,代表了一种用于建模高维生物学数据(从基因组学,蛋白质组学和肽组学派生)的有价值的方法。在存在多个响应的情况下,如何适当地“剖析”模型以揭示单个属性相对于单个响应(例如变量选择)的重要性尤为重要。在本文中,通过接收器工作特性(ROC)分析,研究了多元PLS回归系数在针对不同的组学类型数据选择相关预测变量时的性能。为此,使用模拟数据(模仿微阵列和液相色谱质谱数据的协方差结构)来生成预测因子和响应矩阵。相关的预测因子被设定为先验的。研究了噪声,协方差结构不同的数据源以及相关预测变量的大小的影响。结果表明,在组学类型的数据中,PLS回归系数在为多元PLS的每个响应选择变量时均适用。还提供了与其他特征选择方法的比较,例如投影得分中的变量重要性,主成分回归以及最小绝对收缩和选择算子回归。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号