首页> 外文会议>International Conference on Life System Modeling and Simulation(LSMS 2007); 20070914-17; Shanghai(CN) >A New Hybrid Approach to Predict Subcellular Localization by Incorporating Protein Evolutionary Conservation Information
【24h】

A New Hybrid Approach to Predict Subcellular Localization by Incorporating Protein Evolutionary Conservation Information

机译:结合蛋白质进化保守信息预测亚细胞定位的新混合方法。

获取原文
获取原文并翻译 | 示例

摘要

The rapidly increasing number of sequence entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. Here, an improved evolutionary conservation algorithm was proposed to calculate per residue conservation score. Then, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Finally, a novel hybrid approach was developed by fusing the four kinds of feature classifiers through a product rule system to predict 12 subcellular locations. Compared with existing methods, this new approach provides better predictive performance. High success accuracies were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multi-features classifiers are quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.
机译:进入基因组数据库的序列数量迅速增加,因此需要使用全自动方法进行分析。知道蛋白质的细胞位置是了解其功能的关键步骤。蛋白质属性统计预测的发展通常包括两个核心:一个是构建训练数据集,另一个是制定预测算法。后者可以进一步分为两个子核心:一个是如何给出数学表达式以有效表示蛋白质,另一个是如何找到强大的算法来准确执行预测。在此,提出了一种改进的进化保守算法来计算每个残基的保守分数。然后,每种蛋白质都可以表示为利用多尺度能量(MSE)创建的特征向量。此外,基于氨基酸组成(AAC),加权自相关函数和Moment描述符方法,蛋白质可以表示为其他特征向量。最后,通过将四种特征分类器通过乘积规则系统融合以预测12个亚细胞位置,开发了一种新颖的混合方法。与现有方法相比,此新方法提供了更好的预测性能。在折刀交叉验证测试和独立数据集测试中均获得了很高的成功精度,这表明引入蛋白质进化信息和融合多特征分类器的概念很有前途,并且可能具有巨大的潜力,可作为另一种有用的工具分子生物学领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号