...
首页> 外文期刊>Journal of Molecular Biology >Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments.
【24h】

Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments.

机译:蛋白质插入和缺失的经验分析确定了在蛋白质序列比对中正确放置缺口的参数。

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

To understand how protein segments are inserted and deleted during divergent evolution, a set of pairwise alignments contained exactly one gap, and therefore arising from the first insertion-deletion (indel) event in the time separating the homologs, was examined. The alignments showed that "structure breaking" amino acids (PGDNS) were preferred within and flanking gapped regions, as are two residues with hydrophilic side-chains (QE) that frequently occur at the surface of protein folds. Conversely, hydrophobic residues (FMILYVW) occur infrequently within and flanking the gapped region. These preferences are modestly different in protein pairs separated by an episode of adaptive evolution, than in pairs diverging under strong functional constraints. Surprisingly, regions near an indel have not evolved more rapidly than the sequence pair overall, showing no evidence that an indel event must be compensated by local amino acid replacement. The gap-lengths are best approximated by a Zipfian distribution, with the probability of a gap of length L decreasing as a function of [Formula: see text]. These features are largely independent of the length of the gap and the extent of divergence (measured by both silent and non-silent sequence changes) separating the two proteins. Surprisingly, amino acid repeats were discovered in more than a third of the polypeptide segments in and around the gap. These correspond to repeats in the DNA sequence. This suggests that a signature of the mechanism by which indels occur in the DNA sequence remains in the encoded protein sequences. These data suggest specific tools to score gap placement in an alignment. They also suggest tools that distinguish true indels from gaps created by mistaken gene finding, including under-predicted and over-predicted introns. By providing mechanisms to identify errors, the tools will enhance the value of genome sequence databases in support of integrated paleogenomics strategies used to extract functional information in a post-genomic environment.
机译:为了理解在发散进化过程中蛋白质节段是如何插入和缺失的,检查了一组成对的比对正好包含一个缺口,因此检查了由分离同源物的时间中的第一个插入-缺失(插入/缺失)事件引起的。比对表明,“结构断裂”氨基酸(PGDNS)在有缺口的区域内和侧翼区域是优选的,具有亲水性侧链(QE)的两个残基经常出现在蛋白质折叠的表面。相反,疏水残基(FMILYVW)很少出现在缺口区域内和侧面。这些偏好在通过适应性进化发作分开的蛋白质对中,与在强大的功能约束下发散的蛋白质对中,略有不同。出人意料的是,插入缺失附近的区域没有比整个序列对更快地进化,没有证据表明插入缺失事件必须通过局部氨基酸置换来补偿。间隙长度最好通过Zipfian分布来近似,长度L的间隙的概率根据[公式:参见文本]减小。这些特征很大程度上独立于缺口长度和分离两种蛋白质的差异程度(通过沉默和非沉默序列变化来衡量)。出人意料的是,在缺口内和缺口周围的多肽片段的三分之一以上发现了氨基酸重复。这些对应于DNA序列中的重复。这表明在DNA序列中插入缺失的机制的特征仍然存在于编码的蛋白质序列中。这些数据提出了在比对中对缺口位置进行评分的特定工具。他们还提出了将真正的插入缺失与错误的基因发现所造成的缺口区分开的工具,包括预测不足和预测过度的内含子。通过提供识别错误的机制,这些工具将提高基因组序列数据库的价值,以支持用于在后基因组环境中提取功能信息的综合古基因组学策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号