首页> 中文期刊> 《计算机科学技术学报:英文版》 >Decoding the Structural Keywords in Protein Structure Universe

Decoding the Structural Keywords in Protein Structure Universe

         

摘要

Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools,the protein structure universe tends to be complete without proteins with novel structural folds deposited in the protein data bank (PDB)recently.In this work,we identify a protein structural dictionary (Frag-K)composed of a set of backbone fragments ranging from 4 to 20 residues as the structural "keywords"that can effectively distinguish between major protein folds.We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large scale of high-quality,non-homologous protein structures available in PDB.We analyze the impacts of clustering cut-offs on the performance of the fragment hbraries.Then,the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins).Our results show that a structural dictionary with N400 4-to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号