首页> 外文期刊>Bioinformatics >Variations on probabilistic suffix trees: statistical modeling and prediction of protein families.
【24h】

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families.

机译:概率后缀树的变化:蛋白质家族的统计建模和预测。

获取原文
获取原文并翻译 | 示例
           

摘要

MOTIVATION: We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). The method is based on identifying significant patterns in a set of related protein sequences. The patterns can be of arbitrary length, and the input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without assuming any preliminary biological information, with surprising success. Basic biological considerations such as amino acid background probabilities, and amino acids substitution probabilities can be incorporated to improve performance. RESULTS: The PST can serve as a predictive tool for protein sequence classification, and for detecting conserved patterns (possibly functionally or structurally important) within protein sequences. The method was tested on the Pfam database of protein families with more than satisfactory performance. Exhaustive evaluations show that the PST model detects much more related sequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster.
机译:动机:我们介绍一种通过概率后缀树(PST)建模蛋白质家族的方法。该方法基于鉴定一组相关蛋白质序列中的重要模式。模式可以是任意长度,输入序列不需要对齐,也不需要描述域边界。该方法是自动的,并且可以在不假设任何初步生物学信息的情况下应用,取得了令人惊讶的成功。可以将诸如氨基酸背景概率和氨基酸取代概率之类的基本生物学考虑因素纳入考量,以提高性能。结果:PST可以用作蛋白质序列分类的预测工具,并可以检测蛋白质序列内的保守模式(可能在功能上或结构上很重要)。该方法在蛋白质家族的Pfam数据库上进行了测试,性能令人满意。详尽的评估表明,与成对方法(例如Gapped-BLAST)相比,PST模型检测到更多相关序列,并且几乎与从输入序列的多个比对中训练出的隐马尔可夫模型一样敏感,同时速度更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号