首页> 外文会议>Asia-Pacific Bioinformatics Conference >SE: an algorithm for deriving sequence alignment from a pair of superimposed structures
【24h】

SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

机译:SE:一种用于从一对叠加结构导出序列对齐的算法

获取原文

摘要

Background: Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments.Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments.Results: SE gave an average accuracy of 95.9%over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 A. SE also used considerably less CPU time than DP.Conclusion: The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.
机译:背景:从叠加结构生成序列对齐是许多结构比较计划的重要组成部分。对准的准确性影响结构识别,分类和可能的功能预测。许多程序使用动态编程算法来生成与叠加结构的序列对齐。但是,此过程需要使用差距惩罚,并且根据所使用的惩罚的价值,可以引入虚假的差距和错位。我们呈现了一种新的算法,种子延伸(SE),用于产生一对叠加的序列对齐结构。 SE算法首先找到“种子”,其是符合结构等效的某些严格标准的每个结构的残基对。三个连续的种子形成种子段,其沿两个方向上沿对准矩阵的对角线延伸。残留物之间的距离和氨基酸类型相似度用于解决在延伸多于一个对角线期间出现的冲突。保守域数据库中的手动静音对齐用作评估序列对准的质量的标准。结果:SE的平均精度为582对测试的582对叠加蛋白质,而Chimera,LSQMAN和DP提取Sheba,所有使用动态编程算法,分别产生89.9%,90.2%和91.0%。对于具有低序列或结构相似性的蛋白质对,SE比下一个最佳评分程序平均更准确地产生高达18%的对准。当两种叠加的结构包含以一定角度交叉的等效螺旋或β-股线时,改善最为明显。当在SEBA中实现SE算法以更换动态编程例程时,对准精度平均提高10%,对于2和4 A之间的结构对,SE也使用比DP更小的CPU时间。结论:种子扩展算法快速,不使用间隙惩罚,从使用动态编程算法测试的三个其他程序产生比叠加结构的更准确的序列对齐。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号