首页> 外文学位 >Predicting protein function using sequence derived features selected by genetic algorithms.
【24h】

Predicting protein function using sequence derived features selected by genetic algorithms.

机译:使用遗传算法选择的序列衍生特征预测蛋白质功能。

获取原文
获取原文并翻译 | 示例

摘要

Large scale sequencing of genomes has created a peculiar problem for biology: there is now a glut of information in the form of nucleotide sequences, but deciphering the higher level annotations buried in the nucleotides remains a problem that is unsolved to varying extents. One aspect of this problem is deducing the function of a protein from its sequence. This is an important challenge because the number of raw protein sequences far surpasses the number of well characterized proteins. We address this problem by using computational methods to predict protein function from sequence derived features.; We describe a method for the prediction of protein function in terms of enzymatic activity classification (Enzyme Commission numbers) using only the protein sequence. The method begins by generating sequence derived features for a protein that range from the amino acid composition to predicted features such as secondary structure and solvent accessibility. In order to capture the local environment surrounding a key residue---a residue involved in catalysis, for example---the method searches for combinations of these features that have predictive power when they occur at the same residue. The learning algorithm may find that a particular amino acid residue is a good indicator of some protein function when another sequence derived feature indicates that the residue is predicted to be at the surface of a protein or to be in a beta sheet secondary structure element.; These predictive combinations of features are detected by a genetic algorithm used as a wrapper around a neural network. By incorporating features in the environment surrounding a single residue, the method may be seen as a specialized motif detector that detects instances of these combined features that are correlated with protein function. We evaluate the performance of this method across 59 enzymatic activity classes and find that the genetic algorithm based selection of feature combinations is able to significantly increase the predictive power of the method.
机译:基因组的大规模测序已为生物学带来了一个特殊的问题:现在核苷酸序列形式的信息过多,但是解密掩埋在核苷酸中的高级注释仍然是一个问题,在不同程度上都无法解决。该问题的一方面是从其序列推导蛋白质的功能。这是一个重要的挑战,因为原始蛋白质序列的数量远远超过了特征明确的蛋白质的数量。我们通过使用计算方法从序列衍生特征预测蛋白质功能来解决这个问题。我们仅使用蛋白质序列来描述根据酶活性分类(酶委员会编号)预测蛋白质功能的方法。该方法开始于产生蛋白质的序列衍生特征,其范围从氨基酸组成到预测特征,例如二级结构和溶剂可及性。为了捕获关键残基周围的局部环境,例如催化中涉及的残基,该方法搜索这些特征的组合,这些特征在相同的残基处出现时具有预测能力。学习算法可能会发现,当另一个序列衍生特征表明该氨基酸残基被预测为位于蛋白质表面或位于β折叠二级结构元件中时,特定的氨基酸残基即可很好地指示某些蛋白质的功能。这些预测性的特征组合通过用作围绕神经网络的包装器的遗传算法进行检测。通过将特征合并到单个残基周围的环境中,该方法可以看作是一种特殊的图案检测器,可以检测与蛋白质功能相关的这些组合特征的实例。我们评估了该方法在59种酶活性类别中的性能,发现基于遗传算法的特征组合选择能够显着提高该方法的预测能力。

著录项

  • 作者

    Kernytsky, Andrew.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Chemistry Biochemistry.; Biology Bioinformatics.; Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 138 p.
  • 总页数 138
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物化学;自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号