首页> 美国卫生研究院文献>Proceedings of the National Academy of Sciences of the United States of America >Automatic generation of primary sequence patterns from sets of related protein sequences.
【2h】

Automatic generation of primary sequence patterns from sets of related protein sequences.

机译:从相关蛋白序列集自动生成一级序列模式。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.
机译:我们已经开发了一种计算机算法,可以提取出同源蛋白家族所有成员共有的保守一级序列元素的模式。该方法包括将一组相关序列之间的成对相似性得分聚类以生成二进制树状图(树)。然后通过逐步地用一个公共模式替换连接两个最相似终端的节点,直到仅剩下一个公共“根”模式,以逐步方式减少树。通过(i)使用扩展的动态编程算法对节点连接的序列/模式对执行局部最优比对,然后(ii)根据这种比对,利用一个嵌套的氨基酸类别层次结构,以识别覆盖比对中每个成对元素的最小包含氨基酸类别。使用“一次支付”的间隙罚分规则来创建和/或扩展比对内的间隙,并且在随后的比对期间,带间隙的位置被转换为用作任何类型的0或1个氨基酸的间隙字符。在国家生物医学研究基金会/蛋白质鉴定资源蛋白质序列数据库中,该方法已用于生成覆盖同源库的模式库。我们表明,与用于构建模式的任何单个序列相比,覆盖模式对序列家族成员的诊断能力更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号