...
首页> 外文期刊>Journal of Molecular Biology >Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability
【24h】

Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability

机译:像蛋白质一样的人工序列填补蛋白质序列空间中的空隙和稀疏区域,可以显着增强远程同源性检测能力

获取原文
获取原文并翻译 | 示例
           

摘要

Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like "linker" sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be "plugged-into" routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold.
机译:蛋白质功能注释依赖于准确关系的鉴定,序列差异是关键因素。当只有三维结构证明了远距离的蛋白质关系时,这一点尤其明显。为了解决这一挑战,我们描述了一种通过定向设计类似蛋白质的“连接子”序列有目的地弥合相关蛋白质家族之间的缺口的计算方法。为此,我们将与序列同源物整合在一起的SCOP域家族表示为多个配置文件,并在相关域家族之间进行了HMM-HMM比对。在实现令人信服的比对的地方,我们应用了基于轮盘赌的方法来设计3,611,010个蛋白样序列,对应374个SCOP折叠。为了分析它们在同源搜索中连接蛋白质的能力,我们使用3024个查询来搜索两个数据库,一个仅包含自然序列,另一个包含设计的序列。我们的结果表明,增强的数据库搜索显示超过74%的折痕覆盖率提高了30%,其中52折实现了所有理论上可能的连接。尽管无法在某些家族之间设计序列,但折叠内其他家族之间设计序列的可用性建立了序列连续体,以证明373个困难的关系。最终,作为一种实用和现实的扩展,我们证明了可以将此类蛋白样序列“插入”常规和通用序列数据库搜索中,以不仅可以进行远程同源性检测,而且还可以进行折叠识别。我们丰富的统计支持的研究结果表明,两个数据库中的互补检索将提高基于序列的检索在识别共享同一折叠的所有同源物方面的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号