...
首页> 外文期刊>BMC Bioinformatics >Species-specific protein sequence and fold optimizations
【24h】

Species-specific protein sequence and fold optimizations

机译:物种特异性蛋白质序列和倍数优化

获取原文
           

摘要

Background An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes. Results Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archae, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archae and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (CG) and folds (CF) for each of the complete genomes. CF achieved an average cross-validation success rate of 85 ± 8% whereas the CG detected 73 ± 9% species-specific sequences when competing against all other non-redundant CG. Continuously updated results are available at http://genome.mshri.on.ca . Conclusion Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events.
机译:背景生物体适应其特定环境生态位的能力对其生存和扩散至关重要。在此类最大的研究中,我们寻求鉴定和利用氨基酸特征,使跨100个完整基因组的物种特异性蛋白质适应成为可能。结果通过使用来自17个古细菌,76个细菌和7个真核生物完整基因组的超过360,000个预测的开放阅读框(ORF)的氨基酸组成,通过对应分析确定了环境生态位是变异性的重要因素。此外,我们通过氨基酸组成聚类发现了在系统发育上不相关的古细菌和细菌簇,它们共享相似的环境。保守的,基于域的同源性建模的组成分析表明,在所有基因组中小疏水残基Ala,Gly,Val和带电残基Asp,Glu,His和Arg的富集。但是,较大的芳族残基Phe,Trp和Tyr减少了几倍,并且这些结果不受低复杂度偏差的影响。我们从每个完整基因组的ORF(C G )和折叠(C F )得出两个简单的对数得分函数。 C F 的平均交叉验证成功率达到85±8%,而C G 在与所有其他非冗余竞争时检测到73±9%的物种特异性序列。 C G 。可在http://genome.mshri.on.ca上获得不断更新的结果。结论我们对完整基因组氨基酸组成的分析为基因组序列和倍数中物种特异性和环境残基偏好提供了更有力的证据。从这项工作中获得的评分功能将在未来的蛋白质工程实验中,并可能在确定水平转移事件中有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号