首页> 外文期刊>Genes >Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Errors Caused by Confusing Paralogs and Epaktologs
【24h】

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Errors Caused by Confusing Paralogs and Epaktologs

机译:重新评估后生蛋白的域结构演化:混淆旁系同源物和致残物引起的错误的主要影响

获取原文
       

摘要

In the accompanying paper (Nagy, Szláma, Szarka, Trexler, Bányai, Patthy, Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors) we showed that in the case of UniProtKB/TrEMBL, RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species the contribution of erroneous (incomplete, abnormal, mispredicted) sequences to domain architecture (DA) differences of orthologous proteins might be greater than those of true gene rearrangements. Based on these findings, we suggest that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. In this manuscript we examine the impact of confusing paralogous and epaktologous multidomain proteins (i.e., those that are related only through the independent acquisition of the same domain types) on conclusions drawn about DA evolution of multidomain proteins in Metazoa. To estimate the contribution of this type of error we have used as reference UniProtKB/Swiss-Prot sequences from protein families with well-characterized evolutionary histories. We have used two types of paralogy-group construction procedures and monitored the impact of various parameters on the separation of true paralogs from epaktologs on correctly annotated Swiss-Prot entries of multidomain proteins. Our studies have shown that, although public protein family databases are contaminated with epaktologs, analysis of the structure of sequence similarity networks of multidomain proteins provides an efficient means for the separation of epaktologs and paralogs. We have also demonstrated that contamination of protein families with epaktologs increases the apparent rate of DA change and introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. We have shown that confusing paralogous and epaktologous multidomain proteins significantly increases the apparent rate of DA change in Metazoa and introduces a positional bias in favor of terminal over internal DA changes. Our findings caution that earlier studies based on analysis of datasets of protein families that were contaminated with epaktologs may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of multidomain proteins is presented in an accompanying paper [1].
机译:在随附的论文(Nagy,Szláma,Szarka,Trexler,Bányai,Patthy,重新评估后生蛋白的域结构演化:基因预测错误的重大影响)中,我们显示了在UniProtKB / TrEMBL的情况下,RefSeq,EnsEMBL和NCBI的GNOMON预测了后生动物蛋白序列的错误(不完整,异常,预测错误)序列对直系同源蛋白域结构(DA)差异的贡献可能大于真实基因重排的贡献。基于这些发现,我们建议基于对预测的(经常被错误预测的)蛋白质序列进行比较的早期基因组规模研究可能已经得出了有关多域蛋白质新型域结构进化的错误结论。在本手稿中,我们研究了混淆旁生和致残的多域蛋白(即仅通过独立获取相同域类型而相关的蛋白)对后生动物中多域蛋白DA进化得出的结论的影响。为了估计这种类型错误的影响,我们将其作为具有良好进化历史的蛋白质家族的参考UniProtKB / Swiss-Prot序列。我们已经使用了两种类型的parapara-group构建程序,并监视了在正确注释的多结构域蛋白的Swiss-Prot条目上,将真实paralog与epaktologs分离的各种参数的影响。我们的研究表明,尽管公共蛋白家族数据库被epaktologs污染,但对多域蛋白序列相似性网络结构的分析为分离epaktologs和paralogs提供了有效的手段。我们还证明了由epaktologs污染的蛋白质家族增加了DA变化的表观速率,并在DA差异中引入了偏差,因为它增加了末端与内部DA差异的比例。我们已经表明,混淆旁生和致残的多域蛋白会显着增加后生动物中DA变化的表观速率,并引入有利于末端而非内部DA变化的位置偏差。我们的发现告诫我们,基于对被表蛋白污染的蛋白质家族数据集进行分析的早期研究可能导致有关多域蛋白质新型域结构进化的错误结论。伴随论文[1]提出了对多结构域蛋白DA进化的重新评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号