首页> 外文会议>Twenty-Third International Workshop on Database and Expert Systems Applications. >Enhancing Protein Domain Detection Using Domain Co-occurrence and Domain Exclusion
【24h】

Enhancing Protein Domain Detection Using Domain Co-occurrence and Domain Exclusion

机译:使用域共现和域排除来增强蛋白质域检测

获取原文
获取原文并翻译 | 示例

摘要

Among the relevant annotations that can be at-tributed to a protein, domains occupy a key position. Protein domains are sequential and structural motifs that are found independently in different proteins and in different combinations. One of the most widely used domain scheme is the Pfam database which is a collection of protein domain and families. Each family in Pfam is represented by a multiple sequence alignment and a Hidden Markov Model (HMM).When analyzing a new protein sequence, each Pfam HMM is used to compute a score measuring the similarity between the sequence and the domain. If the score is above a given threshold provided by Pfam, the presence of the domain can be asserted in the protein. However, when applied to proteins of organisms with high evolutionary distance from classical model organisms, this strategy may miss several domains. We recently proposed a method, the Co-Occurrence Domain Detection approach (CODD), that improves the sensitivity of Pfam domain detection by exploiting the tendency of domains to appear preferentially with a few other favorite domains in a protein. Here, we propose to integrate domain exclusion information to prune false positive domains that are in conflict with other domains of the protein. Applied to P. falciparum and L. major proteins, we show that this strategy allows to substantially reduce the proportion of false positives among the new domains predicted by CODD, while preserving as much as possible the sensitivity of the approach.
机译:在可以归因于蛋白质的相关注释中,结构域占据关键位置。蛋白质域是在不同蛋白质中以不同组合独立存在的顺序和结构基序。 Pfam数据库是使用最广泛的域方案之一,该数据库是蛋白质域和家族的集合。 Pfam中的每个家族都由多重序列比对和隐马尔可夫模型(HMM)表示。在分析新的蛋白质序列时,每个Pfam HMM用于计算分数,以测量序列与结构域之间的相似性。如果分数高于Pfam提供的给定阈值,则可以在蛋白质中断定结构域的存在。但是,将其应用于与经典模型生物进化距离较远的生物的蛋白质时,该策略可能会缺失多个域。我们最近提出了一种共现域检测方法(CODD),该方法通过利用域与蛋白质中其他几个最喜欢的域优先出现的趋势来提高Pfam域检测的灵敏度。在这里,我们建议将域排除信息整合到修剪与蛋白质其他域冲突的假阳性域。应用于恶性疟原虫和L.主要蛋白质,我们表明,这种策略可以大大减少由CODD预测的新域中假阳性的比例,同时尽可能保留该方法的敏感性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号