首页> 外文会议>Annual International Conference on Research in Computational Molecular Biology >Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Function
【24h】

Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Function

机译:利用概率决策树的混合物进行直接预测蛋白质功能

获取原文

摘要

We study the direct relationship between basic protein properties and their function. Our goal is to develop a new tool for functional prediction that can be used to complement and support other techniques based on sequence or structure information. In order to define this new measure of similarity between proteins we collected a set of 453 features and properties that characterize proteins and are believed to be correlated and related to structural and functional aspects of proteins. Among these properties are the composition and fraction of different groups of amino acids, predicted secondary structure content, molecular weight, average hy-drophobicity, isoelectric point and others, as well as a set of properties that are extracted from database records of known protein sequences, such as subcellular location, tissue specificity, and others. We introduce the mixture model of probabilistic decision trees to learn the set of potentially complex relationships between features and function. To studythese correlations, trees are created and tested on the Pfam sequence-based classification of proteins and the EC classification of enzyme families. The model is very effective in learning highly diverged protein families or families that are not definedbased on sequence. The resulting tree structure indicates the properties that are strongly correlated with structural and functional aspects of protein families, and can be used to suggest a concise definition of a protein family.
机译:我们研究基础蛋白质特性与其功能之间的直接关系。我们的目标是开发一个新的功能预测工具,可用于基于序列或结构信息来补充和支持其他技术。为了定义蛋白质之间的这种相似性的新测量,我们收集了一组453个特征和性质,其表征蛋白质,并且被认为与蛋白质的结构和功能性方面相关。在这些性质中是不同组的氨基酸,预测的二级结构含量,分子量,平均性高毒性,等电点等的组成和部分,以及从已知蛋白质序列的数据库记录中提取的一组性质,例如亚细胞位置,组织特异性等。我们介绍了概率决策树的混合模型,以了解特征和功能之间的潜在复杂关系。为了研究相关性,在蛋白质的基于PFAM序列的分类和酶家庭的EC分类上创建和测试树木。该模型非常有效地学习高度分叉的蛋白质家庭或不符合序列的家庭。得到的树结构表明,与蛋白质家族的结构和功能方面强烈相关的性质,并且可以用于表明蛋白质家族的简明定义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号