首页> 外文会议>IEEE Symposium on Computer Applications and Industrial Electronics >Classification of twilight zone proteins using a structure-based phylogenetic approach
【24h】

Classification of twilight zone proteins using a structure-based phylogenetic approach

机译:使用基于结构的系统发育方法对暮光区蛋白进行分类

获取原文

摘要

The emerging knowledge in drug discovery has heightened the need to study the classification of proteins in order to understand their structure, functions and evolutionary relationship. Due to high vulnerability of protein sequence to change throughout evolution, it is difficult to identify protein homology of distant evolutionarily-related proteins. These proteins are also known to be structurally homologous, thus, the structural approach was a more suitable method. This study focused on the methods for classifying twilight zone proteins using structure-based phylogenetic approach. However, since protein homology plays a major role in protein classification, finding the best alignment tool is the most crucial step. The classification of proteins was constructed by clustering 15 folds at their superfamily level. These proteins belonged to four main SCOPe classes which are the all alpha proteins (Class A), all beta proteins (Class B), wound alpha beta proteins (Class C) and mixed alpha beta proteins (Class D). Protein homology was identified using structural alignment tools which are FATCAT-F and FATCAT-R, while the sequence alignment was conducted using T-COFFEE. Classification tree was constructed using the Unweighted Pair Group Method of Arithmetic Mean (UPGMA) and the clusters were validated using Adjusted Rand Index (ARi), pseudo-jackknife confidence interval and manual observation of clusters. Results show that the structural approach produced better classification than the sequence-based method by producing clusters with higher resemblance to SCOPe for three main SCOPe classes (Class A, Class C and Class D). Moreover, FATCAT-R was able to cluster proteins more accurately than FATCAT-F with higher ARi results for a majority of protein folds. On the other hand, T-COFFEE was able to cluster Class B proteins more accurately than FATCAT-F and FATCAT-R.
机译:药物发现中的新兴知识已经提高了研究蛋白质分类的必要性,以了解其结构,功能和进化关系。由于蛋白质序列在整个进化过程中极易发生变化,因此很难确定远距离的进化相关蛋白的蛋白同源性。还已知这些蛋白质在结构上是同源的,因此,结构方法是更合适的方法。这项研究的重点是使用基于结构的系统进化方法对暮光区蛋白进行分类的方法。但是,由于蛋白质同源性在蛋白质分类中起着主要作用,因此找到最佳的比对工具是最关键的一步。蛋白质的分类是通过在其超家族水平上聚集15倍来构建的。这些蛋白质属于四个主要的SCOPe类,它们是所有α蛋白(A类),所有β蛋白(B类),伤口αβ蛋白(C类)和混合αβ蛋白(D类)。使用结构比对工具FATFAT-F和FATCAT-R鉴定蛋白质同源性,而使用T-COFFEE进行序列比对。使用算术均值的非加权对组方法(UPGMA)构造分类树,并使用调整后的兰德指数(ARi),伪折刀置信区间和人工观察聚类对聚类进行验证。结果表明,对于三种主要的SCOPe类(A类,C类和D类),通过产生与SCOPe相似度更高的聚类,结构方法比基于序列的方法具有更好的分类。此外,FATCAT-R能够比FATCAT-F更准确地聚集蛋白质,并且大部分蛋白质折叠时的ARi结果更高。另一方面,与FATCAT-F和FATCAT-R相比,T-COFFEE能够更准确地聚集B类蛋白。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号