...
首页> 外文期刊>BMC Bioinformatics >On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation
【24h】

On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

机译:关于将序列相似性分数分解为片段特异性贡献以推断蛋白质同源性,功能预测和注释的必要性

获取原文

摘要

Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formateitrite transporters that, so far, was only supported by structure comparison.
机译:背景蛋白质序列与任何类型的非球状片段(螺旋形线圈,低复杂度区域,跨膜区域,长环等)的相似性,其中位置序列保守是非常简单的物理诱导模式的结果,或者是整体序列特性的结果关键)是错误同源性的相关来源。遗憾的是,这些注意事项经常会在大规模注释研究中引起人们的注意,因为通常无法替代手动处理这些情况。需要定量标准来抑制由于错误同源分配而导致的功能注释转移事件。结果序列同源性概念基于结构要素之间的相似性比较,结构要素是赋予蛋白质整体折叠的基本构件。我们建议将总相似性得分分解为关键折叠和其他剩余贡献,并建议,对于有效的同源性陈述,与折叠相关的得分贡献至少应单独重要。作为本文的一部分,我们提供了DissectHMMER软件程序,用于将HMMER2 / 3分数分解为特定于片段的贡献。我们表明DissectHMMER能够以足够的精度重现HMMER2 / 3分数,并且对于指导序列示例的同源性自动确定很有用。为了概括没有3D结构信息的病例的解剖概念,我们发现基于对齐质量的解剖是一种合适的替代方法。该方法已应用于种子序列空间和UniProt / SwissProt空间中SMART和PFAM域的大规模研究。结论关于倍数关键和其他贡献的序列相似性核心解剖系统地抑制了假命中,此外还恢复了先前模糊的同源性关系,例如到目前为止水通道蛋白与甲酸盐/亚硝酸盐转运蛋白之间的同源性关系,迄今为止仅通过结构比较来支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号