首页> 外文会议>Asia-Pacific Bioinformatics Conference >Automatic classification of protein structures using low-dimensional structure space mappings
【24h】

Automatic classification of protein structures using low-dimensional structure space mappings

机译:使用低维结构空间映射自动分类蛋白质结构

获取原文

摘要

Background: Protein function is closely intertwined with protein structure. Discovery of meaningful structure-function relationships is of utmost importance in protein biochemistry and has led to creation of high-quality, manually curated classification databases, such as the gold-standard SCOP (Structural Classification of Proteins) database. The SCOP database and its counterparts such as CATH provide a detailed and comprehensive description of the structural and evolutionary relationships of the proteins of known structure and are widely employed in structural and computational biology. Since manual classification is both subjective and highly laborious, automated classification of novel structures is increasingly an active area of research. The design of methods for automated structure classification has been rendered even more important since the recent past, due to the explosion in number of solved structures arising out of various structural biology initiatives. In this paper we propose an approach to the problem of structure classification based on creating and tessellating low dimensional maps of the protein structure space (MPSS). Given a set of protein structures, an MPSS is a low dimensional embedding of structural similarity-based distances between the molecules. In an MPSS, a group of proteins (such as all the proteins in the PDB or sub-samplings thereof) under consideration are represented as point clouds and structural relatedness maps to spatial adjacency of the points. In this paper we present methods and results that show that MPSS can be used to create tessellations of the protein space comparable to the clade systems within SCOP. Though we have used SCOP as the gold standard, the proposed approach is equally applicable for other structural classifications. Methods: In the proposed approach, we first construct MPSS using pairwise alignment distances obtained from four established structure alignment algorithms (CE, Dali, FATCAT and MATT). The low dimensional embeddings are next computed using an embedding technique called multidimensional scaling (MDS). Next, by using the remotely homologous Superfamily and Fold levels of the hierarchical SCOP database, a distance threshold is determined to relate adjacency in the low dimensional map to functional relationships. In our approach, the optimal threshold is determined as the value that maximizes the total true classification rate vis-a-vis the SCOP classification. We also show that determining such a threshold is often straightforward, once the structural relationships are represented using MPSS. Results and conclusion: We demonstrate that MPSS constitute highly accurate representations of protein fold space and enable automatic classification of SCOP Superfamily and Fold-level relationships. The results from our automatic classification approach are remarkably similar to those found in the distantly homologous Superfamily level and the quite remotely homologous Fold levels of SCOP. The significance of our results are underlined by the??
机译:背景:蛋白质功能与蛋白质结构密切相关。发现有意义的结构功能关系在蛋白质生物化学中至关重要,并导致了高质量,手动策划的分类数据库,例如金标标准SCOP(蛋白质的结构分类)数据库。 SCOP数据库及其对应物等CANT提供了已知结构蛋白质结构和进化关系的详细和综合描述,并广泛用于结构和计算生物学。由于手动分类是主观且高度费力的,自动化的新颖结构越来越激活的研究领域。由于各种结构生物学倡议产生的解决结构数量的爆炸,自最近的过去,自动化结构分类方法的设计变得更加重要。在本文中,我们提出了一种基于蛋白质结构空间(MPS)的创建和镶嵌低尺寸图的结构分类问题。给定一组蛋白质结构,MPS是分子之间结构相似性的距离的低尺寸嵌入。在MPS中,考虑的一组蛋白质(例如,其PDB中的所有蛋白质)被认为是点云和结构相关性地图到点的空间邻接。在本文中,我们呈现了方法和结果,表明MPSS可用于创建与SCOP内的汉语空间相当的蛋白质空间的曲囊。虽然我们使用SCOP作为黄金标准,但所提出的方法同样适用于其他结构分类。方法:在所提出的方法中,我们首先使用从四个建立的结构对准算法(CE,DALI,FATCAT和MATT)获得的成对对准距离来构建MPS。接下来使用称为多维缩放(MDS)的嵌入技术计算低维嵌入。接下来,通过使用远程同源的超家族和分层SCOP数据库的倍数,确定距离阈值以将低维图中的邻接与功能关系相关联。在我们的方法中,最佳阈值被确定为最大化SCOP分类的总实体分类率的值。一旦使用MPSS表示结构关系,我们也表明确定这种阈值通常是简单的。结果与结论:我们证明MPS构成了蛋白质折叠空间的高度准确表示,使SCOP超家族和折叠级关系的自动分类。我们的自动分类方法的结果非常类似于在远处同源的超家族水平和SCOP的相当远程同源折叠水平中的那些相似。我们的结果的重要性得到了下划线?

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号