【24h】

Sequence Database Search Using Jumping Alignments

机译:使用跳转对齐进行序列数据库搜索

获取原文

摘要

We describe a new algorithm for amino acid sequence classification and the detection of remote homologues. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well balanced manner. This is in contrast to established methods like profiles and hidden Markov models which focus on vertical information as they model the columns of the alignment independently. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence at each position is aligned to one sequence of the multiple alignment, called the "reference sequence". In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compared it to hidden Markov models on a subset of the SCOP database of protein domains. The discriminative quality was assessed by counting the number of false positives that ranked higher than the first true positive (FP-count). For moderate FP-counts above five, the number of successful searches with our method was considerably higher than with hidden Markov models.
机译:我们描述了氨基酸序列分类的新算法和远程的同系物的检测。其基本原理是利用多重比对的在良好平衡的方式垂直和水平的信息。这是相对于状轮廓和隐马尔可夫模型,其专注于垂直的信息,因为他们独立建模对齐的列建立的方法。在我们的设置,我们想从“候选序列”那些属于一个家族给定蛋白质的特定数据库中选择。为了做到这一点,每个候选序列分别针对超家族的成员已知的多重比对用新的跳跃比对算法的装置进行测试。这个算法是Smith-Waterman算法的扩展,并计算一个单一序列和多序列比对的局部比对。相较于传统的方法,但是,这种定位不是基于多重排列的各列的摘要。相反,在每个位置的候选序列与所述多个对准的一个序列,称为“参考序列”。此外,该参考序列可以对准内变化,而每个这样的跳跃惩罚。为了评估跳跃比对算法的辨别质量,我们将它比作隐马尔可夫模型的蛋白质结构域的SCOP数据库的一个子集。该判别质量通过计算比第一个真正的阳性(FP-数)排名较高的假阳性的数量进行评估。对于中度FP-数以上的5家,我们的方法成功的搜索次数比用隐马尔可夫模型要高得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号