首页> 外文学位 >Computational Protein Function Prediction and its Application to the Missing Enzymes Problem
【24h】

Computational Protein Function Prediction and its Application to the Missing Enzymes Problem

机译:计算蛋白功能预测及其在缺失酶问题中的应用

获取原文
获取原文并翻译 | 示例

摘要

Improving the overall annotation level of genomes and completeness of biological pathways with high accuracy is the long term basic goal for this research. Large numbers of proteins are getting sequenced every year, creating a pressing need to build computational techniques for rapidly analyzing genomes to extract relevant knowledge. The purpose of this study is 1) to develop an advanced method to computationally elucidate functions of unannotated proteins, 2) to characterize the relationships between functional terms used to describe the proteins and 3) to further use these relationships to predict missing enzymes in the metabolic pathways.;Here we have developed the Extended Similarity Group (ESG) method for protein annotation prediction that iteratively searches the sequence homology space around the query protein and draws consensus from the annotations of proteins in the neighborhood. In terms of prediction accuracy, ESG has been shown to outperform simple PSI-BLAST search and the PFP method previously developed in our lab. Secondly we have designed two scores, Co-occurrence Association Score (CAS) and PubMed Association Score (PAS), that capture the relationship between pairs of Gene Ontology terms used for annotating the proteins. CAS is based on co-occurrence of annotation terms in the database to annotate the same proteins, and PAS is based on co-mentions of annotation terms in the PubMed abstracts. These two scores have been successfully applied to identify functionally coherent groups of proteins that work in coordinated fashion to achieve some biological task. For newly sequenced genomes, metabolic reconstruction often leads to several missing enzymes where a known reaction is not associated with any gene product. As the next step, we use the aforementioned function association scores combined with the phylogenetic profile and microarray expression data to find the most likely matches for such missing enzymes thereby increasing the completeness of biological knowledge. Thus the principal goal achieved here is to understand and improve the computational characterization of protein annotations starting from the individual proteins and moving towards the systems level.
机译:长期提高基因组整体注释水平和生物途径完整性的准确性是其长期目标。每年都会对大量蛋白质进行测序,这迫切需要建立可快速分析基因组以提取相关知识的计算技术。这项研究的目的是:1)开发一种先进的方法以计算方式阐明未注释蛋白质的功能; 2)表征用于描述蛋白质的功能性术语之间的关系; 3)进一步利用这些关系来预测代谢中缺失的酶在此,我们已经开发了用于蛋白质注释预测的扩展相似性组(ESG)方法,该方法可迭代搜索查询蛋白质周围的序列同源性空间,并从附近蛋白质注释中获得共识。在预测准确性方面,已证明ESG优于简单的PSI-BLAST搜索和我们实验室先前开发的PFP方法。其次,我们设计了两个分数,共现关联分数(CAS)和PubMed关联分数(PAS),它们捕获了用于注释蛋白质的成对基因本体术语对之间的关​​系。 CAS基于数据库中注释术语的共现以注释相同的蛋白质,而PAS基于PubMed摘要中注释术语的共提及。这两个分数已成功应用于鉴定功能一致的蛋白质组,这些蛋白质以协调的方式工作以实现某些生物学任务。对于新测序的基因组,代谢重建通常会导致几种酶的缺失,其中已知反应与任何基因产物均不相关。下一步,我们将上述功能关联评分与系统发育概况和微阵列表达数据相结合,以找到此类缺失酶的最可能匹配项,从而提高生物学知识的完整性。因此,此处实现的主要目标是理解和改进从单个蛋白质开始并朝系统水平发展的蛋白质注释的计算特征。

著录项

  • 作者

    Chitale, Meghana.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Computer science.;Bioinformatics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 174 p.
  • 总页数 174
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号