...
首页> 外文期刊>BMC Genomics >EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
【24h】

EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome

机译:EFIN:预测人类基因组中非同义单核苷酸多态性的功能影响

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data. Results Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available ( http://paed.hku.hk/efin/ webcite ) to the public. Conclusions Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health.
机译:背景技术随着越来越多的新型变体的出现,预测由非同义单核苷酸多态性(nsSNPs)引起的氨基酸取代(AAS)的功能影响变得越来越重要。生物信息学分析对于预测AAS对人类疾病的潜在原因或贡献至关重要,以进行进一步分析,因为对于每个基因组,都存在成千上万种稀有或私有的AAS,而其中只有很少一部分与潜在疾病有关。该领域中的现有算法仍然具有较高的错误预测率,并且需要新颖的开发来充分利用大量的基因组数据。结果我们在此报告了一种新颖的算法,该算法具有两个创新性的变化:1.通过根据与人类的进化距离将同源蛋白质序列分为六个区块并分别评估每个区块中的序列保守性,更好地利用序列保守性信息,以及2。在分析中应尽可能多的此类同源序列。随机森林用于评估每个区块中的序列保守性,并预测AAS对蛋白质功能的潜在影响。在综合数据集上对该算法的测试表明,在当前广泛使用的程序上,预测准确性有了显着提高。该算法和基于Web的应用工具EFIN(非同义SNP的功能影响评估)已免费提供给公众(http://paed.hku.hk/efin/webcite)。结论根据物种与人类的进化距离将同源序列分为不同的区域,并独立评估每组中的序列保守性可显着提高预测准确性。这种方法可以帮助我们更好地了解遗传变异在人类疾病和健康中的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号