首页> 外文期刊>Computational and mathematical methods in medicine >Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings
【24h】

Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings

机译:在高维设置中对时间 - 事件时间数据的变量选择方法的比较

获取原文
           

摘要

Over the last decades, molecular signatures have become increasingly important in oncology and are opening up a new area of personalized medicine. Nevertheless, biological relevance and statistical tools necessary for the development of these signatures have been called into question in the literature. Here, we investigate six typical selection methods for high-dimensional settings and survival endpoints, including LASSO and some of its extensions, component-wise boosting, and random survival forests (RSF). A resampling algorithm based on data splitting was used on nine high-dimensional simulated datasets to assess selection stability on training sets and the intersection between selection methods. Prognostic performances were evaluated on respective validation sets. Finally, one application on a real breast cancer dataset has been proposed. The false discovery rate (FDR) was high for each selection method, and the intersection between lists of predictors was very poor. RSF selects many more variables than the other methods and thus becomes less efficient on validation sets. Due to the complex correlation structure in genomic data, stability in the selection procedure is generally poor for selected predictors, but can be improved with a higher training sample size. In a very high-dimensional setting, we recommend the LASSO-pcvl method since it outperforms other methods by reducing the number of selected genes and minimizing FDR in most scenarios. Nevertheless, this method still gives a high rate of false positives. Further work is thus necessary to propose new methods to overcome this issue where numerous predictors are present. Pluridisciplinary discussion between clinicians and statisticians is necessary to ensure both statistical and biological relevance of the predictors included in molecular signatures.
机译:在过去的几十年中,分子鉴定在肿瘤学中越来越重要,正在开辟一个新的个性化药物领域。然而,在文献中呼吁发展这些签名所需的生物相关性和统计工具。在这里,我们研究了六种典型的选择方法,包括高维设置和生存终点,包括套索和其一些延伸,组分 - 明智的升压和随机生存林(RSF)。基于数据分离的重采样算法用于九个高维模拟数据集,以评估训练集的选择稳定性和选择方法之间的交叉点。在各个验证集上评估预后性能。最后,提出了一个在真正的乳腺癌数据集上的一个应用。每个选择方法的假发现率(FDR)很高,预测器列表之间的交叉点非常差。 RSF选择比其他方法更多的变量,因此在验证集中变得效率较低。由于基因组数据中的复杂相关性,所选预测器的选择过程中的稳定性通常差,但是可以通过更高的训练样本大小来改善。在一个非常高的尺寸设置中,我们推荐洛索-CCVL方法,因为它通过减少所选基因的数量并在大多数情况下最小化FDR来表达其他方法。尽管如此,这种方法仍然提供了高频率的误报。因此,需要进一步的工作来提出克服了许多预测因子的新方法来克服这个问题。临床医生和统计学家之间的PluidIcriCISINARINARINARIANS是必要的,以确保分子签名中包含的预测器的统计和生物学相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号