...
首页> 外文期刊>PLoS Computational Biology >Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence
【24h】

Improvement in Protein Domain Identification Is Reached by Breaking Consensus, with the Agreement of Many Profiles and Domain Co-occurrence

机译:打破共识,许多配置文件和域共现的共识,实现了蛋白质域识别的改进。

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Author Summary Current sequence databases contain hundreds of billions of nucleotides coding for genes and a classification of these sequences is a primary problem in genomics. A reasonable way to organize these sequences is through their predicted domains, but the identification of domains in very divergent sequences, spanning the entire phylogenetic tree of species, is a difficult problem. By generating multiple probabilistic models for a domain, describing the spread of evolutionary patterns in different phylogenetic clades, we can effectively explore domains that are likely to be coded in gene sequences. Through a machine learning approach and optimization techniques, coding for expected evolutionary constraints, we filter the many possibilities of domain identification found for a gene and propose the most likely domain architecture associated to it. The application of this novel approach to the full genome of Plasmodium falciparum, to a dataset of sequences from three SCOP datasets highlights the interest of exploring multiple pathways of domain evolution in the aim of extracting biological information from genomic sequences. Our new computational approach was developed with the hope of providing a novel tier of accurate and precise tools that complement existing tools such as HMMer, HHblits and PSI-BLAST, by exploring in a novel way the large amount of sequence data available. The existence of powerful databases for sequences, domains and architectures help make this hope a reality.
机译:作者概述当前的序列数据库包含数千亿个编码基因的核苷酸,这些序列的分类是基因组学中的主要问题。组织这些序列的合理方法是通过其预测的结构域,但是要确定跨越物种整个系统发育树的非常不同的序列中的结构域是一个难题。通过为一个域生成多个概率模型,描述进化模式在不同系统进化分支中的扩散,我们可以有效地探索可能在基因序列中编码的域。通过机器学习方法和优化技术,为预期的进化约束进行编码,我们过滤了发现基因的域识别的许多可能性,并提出了与之相关的最可能的域架构。这种新方法对恶性疟原虫的完整基因组的应用,对来自三个SCOP数据集的序列的数据集的应用,凸显了探索域进化的多种途径以从基因组序列中提取生物学信息的目的。我们开发了新的计算方法,希望通过新颖的方式探索大量可用的序列数据,从而提供一种新颖的,精确的工具来补充现有工具(例如HMMer,HHblits和PSI-BLAST)。强大的序列,领域和架构数据库的存在使这一希望成为现实。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号