首页> 外文期刊>Molecular BioSystems >Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins
【24h】

Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins

机译:蛋白质序列空间中的级联行走:在天然蛋白质之间的远程同源性检测中使用人工序列

获取原文
获取原文并翻译 | 示例

摘要

Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of 'protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bonafide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a 'roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
机译:在过去的二十年中,在蛋白质远程同源性检测方面做出了许多独创的努力。由于同源蛋白通常在序列上广泛多样化,因此通过完全由序列驱动的搜索来证明这种相关性具有挑战性。在这里,我们描述了一种计算方法,用于生成“蛋白样”序列,该序列可用于弥合蛋白序列空间中的缺口。真实性家族成员的多重比对序列的特定位置得分矩阵中体现的序列轮廓信息,是该算法的起点。观察到的氨基酸倾向和随机数的选择决定了序列中每个位置的残基的选择。以系统的方式,并通过在每个位置应用“轮盘赌”选择方法,我们生成了类似父系的序列,从而促进了围绕该序列的序列空间的扩大。当为大量家庭生成时,我们证明了它们扩展了天然中间相关序列在连接远距离蛋白质中的效用。在91%的评估实例中,设计序列的包含比不存在时的搜索覆盖率提高了5-10%。此外,通过采用TIM,球蛋白,脂质运载蛋白等折叠蛋白的几个实例,我们证明了将设计序列包括在数据库中的成功成功,例如PSI-BLAST和Cascade PSI-BLAST等正向敏化方法的应用,这是一个有前途的机遇仅使用序列信息就大大改善了远程同源性识别。

著录项

  • 来源
    《Molecular BioSystems》 |2012年第8期|p.2076-2084|共9页
  • 作者单位

    National Centre for Biological Sciences, UAS-GKVK Campus,Bangalore 560 065, India,Molecular Biophysics Unit, Indian Institute of Science,Bangalore 560 012, India;

    IISc Mathematics Initiative, Indian Institute of Science,Bangalore 560 012, India;

    Molecular Biophysics Unit, Indian Institute of Science,Bangalore 560 012, India,Supercomputer Education and Research Centre,Indian Institute of Science, Bangalore 560 012, India;

    Molecular Biophysics Unit, Indian Institute of Science,Bangalore 560 012, India;

    National Centre for Biological Sciences, UAS-GKVK Campus,Bangalore 560 065, India;

    Molecular Biophysics Unit, Indian Institute of Science,Bangalore 560 012, India;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号