首页> 外文期刊>Proteins: Structure, Function, and Genetics >Large-scale comparison of protein sequence alignment algorithms with structure alignments.
【24h】

Large-scale comparison of protein sequence alignment algorithms with structure alignments.

机译:蛋白质序列比对算法与结构比对的大规模比较。

获取原文
获取原文并翻译 | 示例
           

摘要

Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling. Yet, the sequence alignment quality of these methods at low sequence identity is not known. We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11:739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches were identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/ SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling. Copyright 2000 Wiley-Liss, Inc.
机译:序列比对程序(例如BLAST和PSI-BLAST)通常以成对,基于配置文件或中间序列搜索(ISS)的方法使用,以检测远程同源性,以进行折叠分配和比较建模。然而,这些方法在低序列同一性下的序列比对质量是未知的。我们已经使用CE结构比对程序(Shindyalov和Bourne,Prot Eng 1998; 11:739)来推导SCOP结构域数据库中所有超家族和家族水平相关蛋白的序列比对。 CE根据每种蛋白质内的距离而不是蛋白质间的距离来比对结构及其序列。我们将BLAST,PSI-BLAST,CLUSTALW和ISS比对与CE结构比对进行了比较。我们发现,根据CE比对判断,CLUSTALW的整体比对在低序列同一性(<25%)时非常差。我们使用PSI-BLAST来搜索非冗余序列数据库(nr),其中SCOP中的每个序列最多使用四个迭代。所得矩阵用于搜索SCOP序列数据库。在每个残基的基础上,PSI-BLAST的比对精度仅略好于BLAST,但PSI-BLAST基质的比对比BLAST的更长,因此可以在结构比对中正确比对残基总数。在同一超家族中,在nr PSI-BLAST搜索中共享一个或多个命中的任何两个SCOP序列均被确定为由共享的中间序列链接。我们通过中间序列检查了最长的SCOP-query / SCOP-hit序列比对的质量,发现ISS产生的序列比单独的PSI-BLAST搜索更长,每个残基的质量几乎可比。根据结构比对,在10-15%的序列同一性下,BLAST可以正确比对28%,PSI-BLAST 40%和ISS 46%的残基。我们还将CE结构对齐方式与DALI程序生成的FSSP结构对齐方式进行了比较。与序列方法相比,来自FSSP数据库的CE和结构比对在10-15%的序列同一性水平上对75%的残基对进行了比对,这表明这些序列比对方法还有很大的改进空间。 BLAST对10,665个非免疫球蛋白SCOP超家族序列对中的8%(几乎所有<25%序列同一性)进行比对,PSI-BLAST匹配17%,双重PSI-BLAST ISS方法对38%的E值<10.0进行比对。结果表明,中间序列不仅可用于折叠分配,而且可用于实现比较模型的更完整的序列比对。版权所有2000 Wiley-Liss,Inc.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号