首页> 美国卫生研究院文献>other >Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data
【2h】

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data

机译:利用基因表达数据对泛癌细胞株对药物敏感性的多基因预测因子进行系统评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Methods: Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC 50 measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation. Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Conclusions: Thanks to this unbiased validation, we now know that this type of models can predict in vitro tumour response to some of these drugs. These models can thus be further investigated on in vivo tumour models. R code to facilitate the construction of alternative machine learning models and their validation in the presented benchmark is available at .
机译:背景:选定的基因突变通常用于指导针对特定患者肿瘤的癌症药物的选择。引入了大型药物基因组学数据集,例如癌症药物敏感性基因组学(GDSC)联盟的数据集,以发现更多的药物敏感性单基因标记。最近,机器学习回归已用于研究根据分子谱类型预测癌细胞株对药物的敏感性。后者揭示了基因表达数据是全癌患者中最具预测性的特征。但是,迄今为止,尚无研究利用GDSC数据来系统比较基于多基因表达数据的机器学习模型的性能与基于基因组数据的广泛使用的单基因标记的性能。方法:在这里,我们介绍了利用随机森林(RF)分类器进行的系统比较,该分类器利用了13,321个基因的表达水平和每种药物平均501个测试的细胞系。为了解决IC 50测量中与时间有关的批次效应,我们采用了比最近的GDSC数据生成的独立测试集,而不是用于训练预测变量的测试集,这表明比标准的k倍交叉验证更现实的验证。结果与讨论:在127种GDSC药物中,我们的结果表明,MANOVA分析揭示的单基因标记往往比这些基于RF的多基因模型具有更高的精确度,但其代价通常是召回率较差(即正确仅检测对药物敏感的一小部分细胞系)。关于整体分类性能,多基因RF分类器可以更好地预测约三分之二的药物。在这些模型中最具预测性的药物中,我们发现了乙胺嘧啶,舒尼替尼和17-AAG。结论:由于这种无偏见的验证,我们现在知道这种类型的模型可以预测对其中某些药物的体外肿瘤反应。因此,可以在体内肿瘤模型上进一步研究这些模型。可在上找到用于简化替代机器学习模型的构建和在所提供基准测试中对其进行验证的R代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号