首页> 外文学位 >Motif-based mining of protein sequences.
【24h】

Motif-based mining of protein sequences.

机译:基于主题的蛋白质序列挖掘。

获取原文
获取原文并翻译 | 示例

摘要

We introduce CASTOR, an automatic, unsupervised system for protein motif discovery and classification. Given amino acid sequences for a group of proteins, CASTOR generates statistically significant motifs and constructs a classification of the proteins by performing motif discovery and refinement in a top-down and recursive manner. The members of each class are likely to share a function, and the motifs associated with the class are likely to account for the function.; We evaluate CASTOR's performance on the G protein-coupled receptor (GPCR) superfamily. The results show that the CASTOR-constructed classification is in better agreement with a manually curated classification than one constructed by another automatic, unsupervised system based on pairwise, global sequence similarity. Furthermore, while manually constructed classifications tend to be hierarchical, the CASTOR-constructed ones that are non-hierarchical suggest that complex functional relationships among classes may be more abundant than expected.; We also apply CASTOR to the mammalian olfactory receptor family, for which very little functional information is available. We infer the potential functional roles associated with the generated motifs and classes by integrating various complex data, such as mutation experiments and ligand binding assays. Among other functional insights gained, we obtain results that support previous hypotheses on structural integrity and post-translational modification. We also propose and provide evidence for a combinatorial molecular mechanism that supports and potentially explains the ligand binding behavior. We additionally define sub-sequences that capture structural features of these receptors and study the motifs present in the sub-sequences.; Finally, we introduce CASTOR+, an automatic, supervised system for protein classification. CASTOR+ adds new proteins to a pre-existing classification where each class is associated with specific motifs, such as that generated by CASTOR, by matching selected motifs in the given classification against each new protein. We evaluate the performance of CASTOR+ on the GPCR superfamily. We find that it performs almost as well as an approach based on pairwise, global sequence similarity in terms of classifying proteins against the bottom level of the manually curated classification. Furthermore, it often succeeds even as the other approach fails when the new proteins have no close homologues in the pre-existing classification.
机译:我们介绍了CASTOR,它是一种自动,无监督的蛋白质基序发现和分类系统。给定一组蛋白质的氨基酸序列,CASTOR产生具有统计意义的基序,并通过以自上而下和递归的方式执行基序发现和提炼来构建蛋白质的分类。每个类别的成员可能共享一个功能,并且与该类别相关联的主题可能会解释该功能。我们评估CASTOR在G蛋白偶联受体(GPCR)超家族上的表现。结果表明,与由另一个基于成对,全局序列相似性的自动无监督系统构建的分类相比,CASTOR构造的分类与手动分类的分类更好地吻合。此外,虽然手动构建的分类倾向于分层,但非分层的CASTOR构造分类表明类别之间的复杂功能关系可能比预期的更为丰富。我们还将CASTOR应用于哺乳动物的嗅觉受体家族,其功能信息很少。我们通过整合各种复杂的数据(例如突变实验和配体结合测定)来推断与生成的基序和类别相关的潜在功能性角色。在获得的其他功能见解中,我们获得的结果支持先前关于结构完整性和翻译后修饰的假设。我们还提出并提供支持和潜在解释配体结合行为的组合分子机制的证据。我们还定义了捕获这些受体的结构特征的子序列,并研究了这些子序列中存在的基序。最后,我们介绍CASTOR +,这是一种用于蛋白质分类的自动监督系统。 CASTOR +将新蛋白质添加到预先存在的分类中,其中每个类别都通过将给定分类中的选定基元与每种新蛋白质进行匹配来与特定基序(例如由CASTOR生成的基序)关联。我们评估CASTOR +在GPCR超家族中的表现。我们发现它的性能与基于成对的全局序列相似性的方法几乎一样好,这是根据针对手动管理分类的最低级别对蛋白质进行分类的。此外,当新蛋白质在现有分类中没有紧密的同源物时,即使其他方法失败了,它也通常会成功。

著录项

  • 作者

    Liu, Agatha H.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Computer Science.; Biology Molecular.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 140 p.
  • 总页数 140
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;分子遗传学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号