Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the 'Twilight Zone' of Sequence Dissimilarity

机译：在“序列差异”的“暮光区”中寻求更好的折叠注释的综合结构基元挖掘

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Here we report a novel graph database mining method called APGM (APproximate Graph Mining) and demonstrate the application to protein structure pattern identification and structure classification. We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrix as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. The biologic motivation of our study is to recognize common structure patterns in"immunoevasins", proteins mediating virus evasion of host immune defense. We investigated two immunologically relevant protein domain families: the Immunoglobulin V set and the Immunoglobulin C1 set. We collected proteins from SCOP release 1.69. For each family we created a culled set of proteins with maximal pairwise sequence identity percentage below 30% by using PISCES server. We combined these proteins and randomly selected proteins to create training and testing data set. We compared our method with one exact graph mining method MGM on classification accuracy. For Immunoglobulin C1 set,the classification based on feature identified by MGM only can reach 73%, while APGM is between 69% ～ 91%. For Immunoglobulin V set, since the exact match method cannot mine any meaningful patterns, it fails in classification, while by using APGM, we have the accuracy about 78%. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. And without loss of generality,choice of appropriate compatibility matrices allows our method to be easily employed in any domain where subgraph labels have some uncertainty.

机译：在这里，我们报告一种称为APGM（APproximate Graph Mining）的新型图形数据库挖掘方法，并演示了其在蛋白质结构模式识别和结构分类中的应用。我们提出一个理论框架，为合并先前的领域知识（如此处研究的替代矩阵）提供实用的软件实现，并设计一种有效的算法来识别近似匹配的频繁子图。通过这样做，我们极大地扩展了复杂数据挖掘算法在处理大量复杂且嘈杂的蛋白质结构数据时的分析能力。我们研究的生物学动机是识别“免疫血管素”中常见的结构模式，这些蛋白质介导宿主免疫防御的病毒逃逸。我们调查了两个与免疫学相关的蛋白质结构域家族：免疫球蛋白V集和免疫球蛋白C1集。我们从SCOP版本1.69收集了蛋白质。对于每个家族，我们使用PISCES服务器创建了一组最大配对序列同一性百分比低于30％的蛋白质。我们将这些蛋白质和随机选择的蛋白质结合在一起，以创建训练和测试数据集。我们将我们的方法与一种精确的图挖掘方法MGM进行了分类精度的比较。对于免疫球蛋白C1集，仅基于MGM识别的特征分类可达到73％，而APGM在69％〜91％之间。对于免疫球蛋白V集，由于精确匹配方法无法挖掘任何有意义的模式，因此分类失败，而通过使用APGM，我们的准确度约为78％。我们的实验研究同时使用病毒和非病毒蛋白，证明了所提出方法的效率和功效。在不失一般性的前提下，选择适当的兼容性矩阵使我们的方法可以轻松应用于子图标签具有某些不确定性的任何领域。

著录项

来源
《The 7th Asia-Pacific Bioinformatics Conference（第七届亚太生物信息学大会）》|2009年|874|共1页
会议地点 Beijing(CN);Beijing(CN)
作者
Yi Jia; Vincent Buhr; Jintao Zhang; Jun Huan; Leonidas N.Carayannopoulos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 QRTP;
关键词

相似文献

外文文献
中文文献
专利

1. A protein block based fold recognition method for the annotation of twilight zone sequences [J] . Suresh V., Ganesan K., Parthasarathy S. Protein and peptide letters . 2013,第3期

机译：基于蛋白质块的褶皱识别方法
2. PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences [J] . K. Ganesan, S. Parthasarathy Journal of Structural and Functional Genomics . 2011,第4期

机译：PSS-3D1D：改进的蛋白质折叠识别的3D1D轮廓方法，用于注释暮光区序列
3. PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences [J] . K. Ganesan, S. Parthasarathy Journal of structural and functional genomics . 2011,第4期

机译：PSS-3D1D：改进的蛋白质折叠识别的3D1D轮廓方法，用于注释暮光区序列
4. Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the "Twilight Zone" of Sequence Dissimilarity [C] . Asia-pacific bioinformatics conference . 2009

机译：朝向综合结构主题采矿，以便在序列异化的“暮光区”中更好地折叠注释
5. Genome scale fold assignment: Exploring fold, function and the twilight zone [D] . Mallick, Parag Kumar 2002

机译：基因组比例折叠分配：探索折叠，功能和暮光区
6. Towards comprehensive structural motif mining for better fold annotation in the twilight zone of sequence dissimilarity [O] . Yi Jia, Jun Huan, Vincent Buhr, 2009

机译：寻求全面的结构基序挖掘以便在序列不相似的暮光区中更好地进行折叠注释
7. Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity [O] . Zhang Jintao, Buhr Vincent, Huan Jun, 2009

机译：寻求全面的结构基序挖掘，以在序列不相似的“暮光区”中更好地折叠注释

Towards Comprehensive Structural Motif Mining for Better Fold Annotation in the 'Twilight Zone' of Sequence Dissimilarity

摘要

著录项

相似文献

相关主题

期刊订阅