首页> 外文会议>Asia-pacific bioinformatics conference >Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the 'Twilight Zone' of Sequence Dissimilarity

【24h】

Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the 'Twilight Zone' of Sequence Dissimilarity

机译：朝向综合结构主题采矿，以便在序列异化的“暮光区”中更好地折叠注释

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Here we report a novel graph database mining method called APGM (APproximate Graph Mining) and demonstrate the application to protein structure pattern identification and structure classification. We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrix as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. The biologic motivation of our study is to recognize common structure patterns in"immunoevasins", proteins mediating virus evasion of host immune defense. We investigated two immunologically relevant protein domain families: the Immunoglobulin V set and the Immunoglobulin C1 set. We collected proteins from SCOP release 1.69. For each family we created a culled set of proteins with maximal pairwise sequence identity percentage below 30% by using PISCES server. We combined these proteins and randomly selected proteins to create training and testing data set. We compared our method with one exact graph mining method MGM on classification accuracy. For Immunoglobulin C1 set,the classification based on feature identified by MGM only can reach 73%, while APGM is between 69% ～ 91%. For Immunoglobulin V set, since the exact match method cannot mine any meaningful patterns, it fails in classification, while by using APGM, we have the accuracy about 78%. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. And without loss of generality,choice of appropriate compatibility matrices allows our method to be easily employed in any domain where subgraph labels have some uncertainty.

机译：在这里，我们报告了一种名为APGM的新型图形数据库挖掘方法（近似图形挖掘），并证明蛋白质结构模式识别和结构分类的应用。我们提出了一个理论框架，提供整合前的领域知识，如替换矩阵为在这里学习一个实用软件的实施，并制定一个有效的算法，以确定近似匹配的频繁子。通过这样做，我们显着扩展了复杂数据挖掘算法的分析力，以处理大量的复杂和嘈杂的蛋白质结构数据。我们的研究的生物学动机是识别“免疫缺陷”，蛋白质中的常见结构模式，介导病毒免疫防御。我们调查了两种免疫相关蛋白质结构域家族：免疫球蛋白V套和免疫球蛋白C1设定。我们从SCOP释放1.69中收集蛋白质。对于每个家庭，我们通过使用Pisces服务器创建了一种具有最大成对序列标识百分比的剔除蛋白质，最大成对序列标识百分比低于30％。我们组合这些蛋白质和随机选择的蛋白质来创建训练和测试数据集。我们将我们的方法与一个精确的图形挖掘方法MGM进行了比较了分类准确性。对于免疫球蛋白C1设定，基于MGM鉴定的特征的分类只能达到73％，而APGM在69％〜91％之间。对于免疫球蛋白V集，由于确切的匹配方法无法挖掘任何有意义的模式，因此它在分类中失败，而通过使用APGM，我们的准确性约为78％。我们使用病毒和非病毒蛋白的实验研究证明了所提出的方法的效率和功效。在没有损失的情况下，适当的兼容性矩阵的选择允许我们的方法在外部域中容易地使用，其中子图标签具有一些不确定性。

著录项

来源
《Asia-pacific bioinformatics conference》|2009年||共1页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类生物工程学（生物技术）;
关键词

相似文献

外文文献
中文文献
专利

1. A protein block based fold recognition method for the annotation of twilight zone sequences [J] . Suresh V., Ganesan K., Parthasarathy S. Protein and peptide letters . 2013,第3期

机译：基于蛋白质块的褶皱识别方法
2. PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences [J] . K. Ganesan, S. Parthasarathy Journal of Structural and Functional Genomics . 2011,第4期

机译：PSS-3D1D：改进的蛋白质折叠识别的3D1D轮廓方法，用于注释暮光区序列
3. PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences [J] . K. Ganesan, S. Parthasarathy Journal of structural and functional genomics . 2011,第4期

机译：PSS-3D1D：改进的蛋白质折叠识别的3D1D轮廓方法，用于注释暮光区序列
4. Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the "Twilight Zone" of Sequence Dissimilarity [C] . Yi Jia, Vincent Buhr, Jintao Zhang, The 7th Asia-Pacific Bioinformatics Conference（第七届亚太生物信息学大会） . 2009

机译：在“序列差异”的“暮光区”中寻求更好的折叠注释的综合结构基元挖掘
5. Genome scale fold assignment: Exploring fold, function and the twilight zone [D] . Mallick, Parag Kumar 2002

机译：基因组比例折叠分配：探索折叠，功能和暮光区
6. Towards comprehensive structural motif mining for better fold annotation in the twilight zone of sequence dissimilarity [O] . Yi Jia, Jun Huan, Vincent Buhr, 2009

机译：寻求全面的结构基序挖掘以便在序列不相似的暮光区中更好地进行折叠注释
7. Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity [O] . Zhang Jintao, Buhr Vincent, Huan Jun, 2009

机译：寻求全面的结构基序挖掘，以在序列不相似的“暮光区”中更好地折叠注释

Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the 'Twilight Zone' of Sequence Dissimilarity

摘要

著录项

相似文献

相关主题

期刊订阅