首页> 外文会议>Proceedings of the Second conference on Asia-Pacific bioinformatics >Local prediction approach for protein classification using probabilistic suffix trees
【24h】

Local prediction approach for protein classification using probabilistic suffix trees

机译:使用概率后缀树进行蛋白质分类的局部预测方法

获取原文
获取原文并翻译 | 示例

摘要

Probabilistic suffix tree (PST) is a stochastic model that uses a suffix tree as an index structure to store conditional probabilities associated with subsequences. PST has been successfully used to model and predict protein families following global approach. Their approach takes into account the entire sequence, and thus is not suitable for partially conserved families. We develop two variants of PST for local prediction: multiple-domain prediction and best-domain prediction. The multiple-domain method predicts the probability that a protein belongs to a family based on one or more significant conserved regions, while the best-domain method does it based on the most conserved region in the query sequence. The time complexity of both of our approaches is the same as that of the global prediction, that is, O(Lm) where L is the depth bound of the tree and m is the size of the query sequence. We tested our algorithms on the Pfam database of protein familiesand compared the results with the global prediction method. The experimental results show that our approaches have higher accuracy of prediction than that of global approach. We also show that, our local prediction approach is an effective way to extract motifs/domains. Our approaches employ a linear time method for building PST by adapting the linear time construction of Probabilistic Automata reported by A. Apostolico et al.
机译:概率后缀树(PST)是一种随机模型,它使用后缀树作为索引结构来存储与子序列相关的条件概率。 PST已被成功地用于按照全局方法对蛋白质家族进行建模和预测。他们的方法考虑了整个序列,因此不适用于部分受保护的家庭。我们开发了用于本地预测的PST的两种变体:多域预测最佳域预测。多域方法基于一个或多个重要的保守区域来预测蛋白质属于某个家族的概率,而最佳域方法则基于查询序列中最保守的区域来进行预测。两种方法的时间复杂度都与全局预测的时间复杂度相同,即 O Lm ),其中 L 是树的深度界限, m 是查询序列的大小。我们在蛋白质家族的Pfam数据库上测试了我们的算法,并将结果与​​全局预测方法进行了比较。实验结果表明,我们的方法比全局方法具有更高的预测精度。我们还表明,我们的局部预测方法是提取图案/域的有效方法。我们的方法通过采用A. Apostolico等人报告的概率自动机的线性时间结构,采用线性时间方法构建PST。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号