首页> 外文期刊>Bioinformatics >Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data
【24h】

Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data

机译:基于Devance残差的稀疏PLS和稀疏核PLS回归的删失数据

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: A vast literature from the past decade is devoted to relating gene profiles and subject survival or time to cancer recurrence. Biomarker discovery from high-dimensional data, such as transcriptomic or single nucleotide polymorphism profiles, is a major challenge in the search for more precise diagnoses. The proportional hazard regression model suggested by Cox (1972), to study the relationship between the time to event and a set of covariates in the presence of censoring is the most commonly used model for the analysis of survival data. However, like multivariate regression, it supposes that more observations than variables, complete data, and not strongly correlated variables are available. In practice, when dealing with high-dimensional data, these constraints are crippling. Collinearity gives rise to issues of over-fitting and model misidentification. Variable selection can improve the estimation accuracy by effectively identifying the subset of relevant predictors and enhance the model interpretability with parsimonious representation. To deal with both collinearity and variable selection issues, many methods based on least absolute shrinkage and selection operator penalized Cox proportional hazards have been proposed since the reference paper of Tibshirani. Regularization could also be performed using dimension reduction as is the case with partial least squares (PLS) regression. We propose two original algorithms named sPLSDR and its non-linear kernel counterpart DKsPLSDR, by using sparse PLS regression (sPLS) based on deviance residuals. We compared their predicting performance with state-of-the-art algorithms on both simulated and real reference benchmark datasets.
机译:动机:过去十年间的大量文献致力于将基因概况与受试者的生存或癌症复发的时间联系起来。从高维数据(例如转录组或单核苷酸多态性谱)中发现生物标志物是寻找更精确诊断的主要挑战。 Cox(1972)提出的比例风险回归模型是研究生存数据分析中最常用的模型,该模型用于研究事件发生时间与存在删失的一组协变量之间的关系。但是,与多元回归一样,它假设可以得到比变量更多的观测值,完整的数据以及不具有高度相关性的变量。实际上,在处理高维数据时,这些约束正在恶化。共线性会引起过度拟合和模型错误识别的问题。变量选择可以通过有效地识别相关预测变量的子集来提高估计准确性,并通过简约表示来增强模型的可解释性。为了处理共线性和变量选择问题,自Tibshirani的参考论文以来,已经提出了许多基于最小绝对收缩和选择算子惩罚Cox比例风险的方法。正则化也可以使用降维来执行,就像偏最小二乘(PLS)回归的情况一样。通过使用基于偏差残差的稀疏PLS回归(sPLS),我们提出了两种原始算法,分别称为sPLSDR及其非线性内核副本DKsPLSDR。我们将它们的预测性能与模拟和真实参考基准数据集上的最新算法进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号