...
首页> 外文期刊>Nucleic acids research >Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics
【24h】

Conserved codon composition of ribosomal protein coding genes in Escherichia coli, Mycobacterium tuberculosis and Saccharomyces cerevisiae: lessons from supervised machine learning in functional genomics

机译:大肠杆菌,结核分枝杆菌和酿酒酵母中核糖体蛋白编码基因的保守密码子组成:功能基因组学中有监督机器学习的经验教训

获取原文

摘要

Genomics projects have resulted in a flood of sequence data. Functional annotation currently relies almost exclusively on inter-species sequence comparison and is restricted in cases of limited data from related species and widely divergent sequences with no known homologs. Here, we demonstrate that codon composition, a fusion of codon usage bias and amino acid composition signals, can accurately discriminate, in the absence of sequence homology information, cytoplasmic ribosomal protein genes from all other genes of known function in Saccharomyces cerevisiae, Escherichia coli and Mycobacterium tuberculosis using an implementation of support vector machines, SVMlight. Analysis of these codon composition signals is instructive in determining features that confer individuality to ribosomal protein genes. Each of the sets of positively charged, negatively charged and small hydrophobic residues, as well as codon bias, contribute to their distinctive codon composition profile. The representation of all these signals is sensitively detected, combined and augmented by the SVMs to perform an accurate classification. Of special mention is an obvious outlier, yeast gene RPL22B, highly homologous to RPL22A but employing very different codon usage, perhaps indicating a non-ribosomal function. Finally, we propose that codon composition be used in combination with other attributes in gene/protein classification by supervised machine learning algorithms.
机译:基因组学项目导致​​序列数据泛滥。目前,功能注释几乎完全依赖于种间序列比较,并且在来自相关物种的数据有限且序列同源性差异很大的情况下受到限制。在这里,我们证明了密码子组成,密码子使用偏好和氨基酸组成信号的融合,可以在没有序列同源性信息的情况下,准确地将酿酒酵母,大肠杆菌和结核分枝杆菌使用支持向量机SVM light 的实现。这些密码子组成信号的分析对确定赋予核糖体蛋白基因个性的特征具有指导意义。每组带正电,带负电和小的疏水残基以及密码子偏性均有助于其独特的密码子组成特征。所有这些信号的表示均由SVM敏感地检测,组合和增强,以执行准确的分类。特别值得一提的是一个明显的异常基因,即酵母基因RPL22B,与RPL22A高度同源,但使用的密码子用法却大不相同,这可能表明它具有非核糖体功能。最后,我们建议通过监督机器学习算法将密码子组成与基因/蛋白质分类中的其他属性结合使用。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号