首页> 外文期刊>Bioinformatics >A gradient-boosting approach for filtering de novo mutations in parent-offspring trios
【24h】

A gradient-boosting approach for filtering de novo mutations in parent-offspring trios

机译:一种梯度增强方法,用于过滤父代-后代三重奏中的从头突变

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Whole-genome and -exome sequencing on parent-offspring trios is a powerful approach to identifying disease-associated genes by detecting de novo mutations in patients. Accurate detection of de novo mutations from sequencing data is a critical step in triobased genetic studies. Existing bioinformatic approaches usually yield high error rates due to sequencing artifacts and alignment issues, which may either miss true de novo mutations or call too many false ones, making downstream validation and analysis difficult. In particular, current approaches have much worse specificity than sensitivity, and developing effective filters to discriminate genuine from spurious de novo mutations remains an unsolved challenge. Results: In this article, we curated 59 sequence features in whole genome and exome alignment context which are considered to be relevant to discriminating true de novo mutations from artifacts, and then employed a machine-learning approach to classify candidates as true or false de novo mutations. Specifically, we built a classifier, named De Novo Mutation Filter (DNMFilter), using gradient boosting as the classification algorithm. We built the training set using experimentally validated true and false de novo mutations as well as collected false de novo mutations from an in-house large-scale exomesequencing project. We evaluated DNMFilter's theoretical performance and investigated relative importance of different sequence features on the classification accuracy. Finally, we applied DNMFilter on our in-house whole exome trios and one CEU trio from the 1000 Genomes Project and found that DNMFilter could be coupled with commonly used de novo mutation detection approaches as an effective filtering approach to significantly reduce false discovery rate without sacrificing sensitivity. Availability: The software DNMFilter implemented using a combination of Java and R is freely available from the website at http:// humangenome. duke. edu/software
机译:动机:对亲子后代进行全基因组和外显子测序是通过检测患者的从头突变来鉴定与疾病相关的基因的有效方法。从测序数据准确检测从头突变是基于三重基因的遗传研究中的关键步骤。由于测序伪影和比对问题,现有的生物信息学方法通常会产生较高的错误率,这可能会错过真正的从头突变或调用过多的错误突变,从而使下游验证和分析变得困难。特别是,目前的方法比敏感性要差得多,而开发有效的过滤器以区分真正的伪造的no novo突变仍然是一个尚未解决的挑战。结果:在本文中,我们整理了整个基因组和外显子组比对上下文中的59个序列特征,这些特征被认为与从工件中区分真正的从头突变有关,然后采用机器学习方法将候选者分类为从头到尾突变。具体来说,我们使用梯度增强作为分类算法,构建了一个名为De Novo变异滤波器(DNMFilter)的分类器。我们使用经过实验验证的真假假突变和从内部大型外显子测序项目收集的假假突变构建训练集。我们评估了DNMFilter的理论性能,并研究了不同序列特征对分类准确性的相对重要性。最后,我们将DNMFilter应用于内部的整个外显子组三重奏和来自1000个基因组计划的一个CEU三重奏,发现DNMFilter可与常用的从头突变检测方法结合使用,作为一种有效的过滤方法,可在不牺牲准确性的情况下显着降低错误发现率灵敏度。可用性:可通过网站http:// humangenome免费获得使用Java和R组合实现的DNMFilter软件。公爵。教育/软件

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号