首页> 外文期刊>Nucleic acids research >DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies
【24h】

DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies

机译:DriverML:一种用于在癌症测序研究中识别驱动基因的机器学习算法

获取原文
获取外文期刊封面目录资料

摘要

Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao’s score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods. Its source code is available at https://github.com/HelloYiHan/DriverML.
机译:尽管在优先考虑癌症驱动基因的计算方法方面已经取得了快速进展,但是研究还远没有达到发现与癌症真正相关的完整基因目录的最终目标。从这些计算工具预测的驱动基因列表缺乏一致性,容易出现误报。在这里,我们开发了一种方法(DriverML),该方法将Rao的分数测试和监督的机器学习相结合,以识别癌症驱动基因。得分统计中的权重参数量化了突变对蛋白质的功能影响。为了获得最佳的体重参数,需要在泛癌训练数据上最大化先前驱动基因的得分统计。我们在癌症基因组图谱(TCGA)的31个独立数据集中,对DriverML与其他20种现有工具进行了严格而公正的基准分析和比较。我们的综合评估表明,DriverML在各种数据集中具有强大的功能,并且在精度和灵敏度之间达到了更好的平衡,优于其他工具。基于体外细胞的测定进一步证明了DriverML预测新型驱动基因的有效性。总之,DriverML使用基于机器学习的创新方法来对癌症驱动基因进行优先级排序,并相对于现有方法进行了重大改进。其源代码位于https://github.com/HelloYiHan/DriverML。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号