首页> 外文会议>Asia-Pacific Bioinformatics Conference >CLASSIFICATION OF PROTEIN 3D FOLDS BY HIDDEN MARKOV LEARNING ON SEQUENCES OF STRUCTURAL ALPHABETS
【24h】

CLASSIFICATION OF PROTEIN 3D FOLDS BY HIDDEN MARKOV LEARNING ON SEQUENCES OF STRUCTURAL ALPHABETS

机译:隐马尔可夫学习在结构字母表序列中的蛋白质3D折叠分类

获取原文

摘要

Fragment-based analysis of protein three-dimensional (3D) structures has received increased attention in recent years. Here, we used a set of pentamer local structure alphabets (LSAs) recently derived in our laboratory to represent protein structures,i.e. we transformed the 3D structures into one-dimensional (ID) sequences of LSAs. We then applied Hidden Markov Model training to these LSA sequences to assess their ability to capture features characteristic of 43 populated protein folds. In the sizerange of LSAs examined (5 to 41 alphabets), the performance was optimal using 20 alphabets, giving an accuracy of fold classification of 82% in a 5-fold cross-validation on training-set structures sharing < 40% pairwise sequence identity at the amino acid level. For test-set structures, the accuracy was as high as for the training set, but fell to 65% for those sharing no more than 25% amino acid sequence identity with the training-set structures. These results suggest that sufficient 3D information can be retained during the drastic 3D->1D transformation for use as a framework for developing efficient and useful structural bioinformatics tools.
机译:蛋白质的三维(3D)结构的基于片段的分析,近年来日益受到重视。在这里,我们使用了一套五聚体局部结构字母(LSA)的最近在我们的实验室得出的代表蛋白质结构,即。我们改造了3D结构进LSA的一维(1D)的序列。然后,我们应用隐马尔可夫模型的培训,这些LSA序列,以评估其能力捕获功能的43个填充的蛋白质折叠特性。在LSA的sizerange检查(5至41字母),性能是最佳的,用20个字母表,给予82%的倍分类的准确性在5倍于训练集的结构共享交叉验证<40%的成对序列同一性在氨基酸水平上。对于测试组结构,准确率高达对于训练集,但下降到65%的那些共享不超过25%的氨基与所述训练集结构酸序列同一性。这些结果表明可以急剧3D-期间被保持足够的3D信息> 1D变换用作用于开发有效的和有用的结构生物信息学工具的框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号