首页> 外文期刊>BMC Medical Genomics >Computational identification of deleterious synonymous variants in human genomes using a feature-based approach
【24h】

Computational identification of deleterious synonymous variants in human genomes using a feature-based approach

机译:使用基于特征的方法对人类基因组中有害同义变异进行计算鉴定

获取原文
           

摘要

Although synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many methods have been developed for predicting the functional impact of single nucleotide variants, only a few have been specifically designed for identifying pathogenic sSNVs. In this work, we describe a computational model, IDSV (Identification of Deleterious Synonymous Variants), which uses random forest (RF) to detect deleterious sSNVs in human genomes. We systematically investigate a total of 74 multifaceted features across seven categories: splicing, conservation, codon usage, sequence, pre-mRNA folding energy, translation efficiency, and function regions annotation features. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the sequential backward selection method. Based on the optimized 10 features, a RF classifier is developed to identify deleterious sSNVs. The results on benchmark datasets show that IDSV outperforms other state-of-the-art methods in identifying sSNVs that are pathogenic. We have developed an efficient feature-based prediction approach (IDSV) for deleterious sSNVs by using a wide variety of features. Among all the features, a compact and useful feature subset that has an important implication for identifying deleterious sSNVs is identified. Our results indicate that besides splicing and conservation features, a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs. While the function regions annotation and sequence features are weakly informative, they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features. The data and source code are available on website http://bioinfo.ahu.edu.cn:8080/IDSV .
机译:尽管同义的单核苷酸变体(sSNV)不会改变蛋白质序列,但已显示它们在人类疾病中起重要作用。区分致病性sSNV与中性致病性sSNV具有挑战性,因为致病性sSNV的患病率往往较低。尽管已开发出许多方法来预测单核苷酸变体的功能影响,但仅专门设计了几种方法来鉴定病原性sSNV。在这项工作中,我们描述了一种计算模型IDSV(有害同义变体的标识),该模型使用随机森林(RF)来检测人类基因组中的有害sSNV。我们系统地研究了七个类别中的74个多方面特征:剪接,保守,密码子使用,序列,mRNA折叠前能,翻译效率和功能区注释特征。然后,为了去除冗余和不相关的特征并提高预测性能,使用顺序后向选择方法进行特征选择。基于优化的10个功能,开发了RF分类器以识别有害的sSNV。基准数据集上的结果表明,IDSV在识别致病性sSNV方面优于其他最新技术。我们已经通过使用多种功能为有害的sSNV开发了一种有效的基于特征的预测方法(IDSV)。在所有特征中,确定了一个紧凑且有用的特征子集,该子集对于识别有害的sSNV具有重要意义。我们的结果表明,除了剪接和保留功能外,新的翻译效率功能还是识别有害sSNV的有益功能。尽管功能区注释和序列特征的信息性较弱,但当与其他特征组合时,它们可能具有将有害sSNV与良性sSNV区别开的能力。数据和源代码可从网站http://bioinfo.ahu.edu.cn:8080/IDSV获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号