首页> 美国卫生研究院文献>other >The Stream Algorithm: Computationally Efficient Ridge-Regression via Bayesian Model Averaging and Applications to Pharmacogenomic Prediction of Cancer Cell Line Sensitivity
【2h】

The Stream Algorithm: Computationally Efficient Ridge-Regression via Bayesian Model Averaging and Applications to Pharmacogenomic Prediction of Cancer Cell Line Sensitivity

机译:流算法:通过贝叶斯模型平均计算有效的岭回归及其在癌细胞系敏感性药物基因组学预测中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computational efficiency is important for learning algorithms operating in the “large p, small n” setting. In computational biology, the analysis of data sets containing tens of thousands of features (“large p”), but only a few hundred samples (“small n”), is nowadays routine, and regularized regression approaches such as ridge-regression, lasso, and elastic-net are popular choices. In this paper we propose a novel and highly efficient Bayesian inference method for fitting ridge-regression. Our method is fully analytical, and bypasses the need for expensive tuning parameter optimization, via cross-validation, by employing Bayesian model averaging over the grid of tuning parameters. Additional computational efficiency is achieved by adopting the singular value decomposition re-parametrization of the ridge-regression model, replacing computationally expensive inversions of large p × p matrices by efficient inversions of small and diagonal n × n matrices. We show in simulation studies and in the analysis of two large cancer cell line data panels that our algorithm achieves slightly better predictive performance than cross-validated ridge-regression while requiring only a fraction of the computation time. Furthermore, in comparisons based on the cell line data sets, our algorithm systematically out-performs the lasso in both predictive performance and computation time, and shows equivalent predictive performance, but considerably smaller computation time, than the elastic-net.
机译:计算效率对于学习在“大p,小n”设置下运行的算法很重要。在计算生物学中,分析包含成千上万个特征(“大p”)但仅包含数百个样本(“小n”)的数据集是当今的常规方法,并且使用常规回归方法(如岭回归,套索)和弹性网是最受欢迎的选择。在本文中,我们提出了一种新颖且高效的贝叶斯推断方法来拟合岭回归。我们的方法是完全分析性的,并且通过使用跨调整参数网格的贝叶斯模型平均,通过交叉验证避免了昂贵的调整参数优化需求。通过采用脊回归模型的奇异值分解重新参数化,用小和对角n×n矩阵的有效反演代替大p×p矩阵在计算上的昂贵反演,可以提高计算效率。我们在仿真研究中和在对两个大型癌细胞系数据面板的分析中表明,与交叉验证的岭回归相比,我们的算法实现的预测性能稍好,而只需要计算时间的一小部分。此外,在基于细胞系数据集的比较中,我们的算法在预测性能和计算时间上均系统地优于套索,并显示了与弹性网相当的预测性能,但计算时间却大大短于弹性网。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号