首页> 外文会议>IEEE International Conference on Data Mining Workshops >Mining Protein Sequence Databases for Remote Homologues That Can Display Considerable Domain Length Variations
【24h】

Mining Protein Sequence Databases for Remote Homologues That Can Display Considerable Domain Length Variations

机译:用于远程同源物的挖掘蛋白序列数据库,可以显示相当大的域长度变化

获取原文

摘要

Protein domains are minimal structural units that can independently fold and carry out discrete biological functions. Evolutionary divergence amongst proteins not only cause considerable sequence changes of protein domains of similar folds and functions, but can also give rise to remarkable length variations. Rapid and heuristic sequence search algorithms are generally sensitive and effective in recognizing protein domains that are distantly related within large sequence databases, but are not well-suited to identify remote homologues of varying lengths. It is also challenging to distinguish reliable hits from a vast number of putative false positives that could have sub optimal sequence similarities. Here, we present a data-mining approach that provides stage-specific filters in sequence searches to reliably accumulate remote homologues which encourages sampling of length variations albeit no compensation on the validity of hitherto identified distant relationships. Realization of remote homologues with vivid length variations could contribute to better understanding of functional variety within protein domain super families.
机译:蛋白结构域是可以独立地折叠,并进行离散的生物功能的最小的结构单元。进化趋异的蛋白质之间不仅引起相似的折叠和功能的蛋白质结构域的相当大的序列变化,而且还可以引起显着的长度变化。快速和启发式序列搜索算法是在认识到是大的序列数据库内远亲蛋白质结构域通常敏感和有效的,但不是很适合,以确定不同长度的远程同系物。这也是具有挑战性的从大量可能有次优的序列相似性推定误报的区分可靠命中。这里,我们提出一个数据挖掘方法,其提供阶段特异性滤波器中顺序搜索,以可靠地积聚鼓励尽管迄今上确定的远缘关系的有效性没有任何补偿采样长度变化的远程同系物。以生动的长度变化远程同源的实现可能有助于更好地理解蛋白质结构域超家族中功能多样性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号