首页> 外文学位 >Data mining techniques for enhancing protein function prediction.
【24h】

Data mining techniques for enhancing protein function prediction.

机译:增强蛋白质功能预测的数据挖掘技术。

获取原文
获取原文并翻译 | 示例

摘要

Proteins are the most essential and versatile macromolecules of life, and the knowledge of their functions is crucial for obtaining a basic understanding of the cellular processes operating in an organism as well as for important applications in biotechnology, such as the development of new drugs, better crops, and synthetic biochemicals such as biofuels. Recent revolutions in biotechnology has given us numerous high-throughput experimental technologies that generate very useful data, such as gene expression and protein interaction data, that provide high-resolution snapshots of complex cellular processes and a novel avenue to understand their underlying mechanisms. In particular, several computational approaches based on the principle of Guilt by Association (GBA) have been proposed to predict the function(s) of the protein are inferred from those of other proteins that are "associated" to it in these data sets. In this thesis, we have developed several novel methods for improving the performance of these approaches by making use of the unutilized and under-utilized information in genomic data sets, as well as their associated knowledge bases. In particular, we have developed pre-processing methods for handling data quality issues with gene expression (microarray) data sets and protein interaction networks that aim to enhance the utility of these data sets for protein function prediction. We have also developed a method for incorporating the inter-relationships between functional classes, as captured by the ontologies in Gene Ontology, into classification-based protein function prediction algorithms, which enabled us to improve the quality of predictions made for several functional classes, particularly those with very few member proteins (rare classes). Finally, we have developed a novel association analysis-based biclustering algorithm to address two major challenges with traditional biclustering algorithms, namely an exhaustive search of all valid biclusters satisfying the definition specified by the algorithm, and the ability to search for small biclusters. This algorithm makes it possible to discover smaller sized biclusters that are more significantly enriched with specific GO terms than those produced by the traditional biclustering algorithms. Overall, the methods proposed in this thesis are expected to help uncover the functions of several unannotated proteins (or genes), as shown by specific examples cited in some of the chapters. To conclude, we also suggest several opportunities for further progress on the very important problem of protein function prediction.
机译:蛋白质是生命中最重要,用途最广泛的大分子,其功能知识对于基本了解生物体内的细胞过程以及生物技术的重要应用(例如开发新药)至关重要。作物和合成生物化学物质(例如生物燃料)。生物技术的最新革命为我们提供了许多高通量实验技术,这些技术可以产生非常有用的数据,例如基因表达和蛋白质相互作用数据,这些数据可以提供复杂细胞过程的高分辨率快照,以及了解其潜在机制的新颖途径。尤其是,已经提出了几种基于关联有罪感(GBA)原理的计算方法来预测蛋白质的功能,这些功能是从这些数据集中与之“关联”的其他蛋白质的功能推断出来的。在本文中,我们开发了几种新颖的方法来利用基因组数据集及其相关知识库中的未利用和未充分利用的信息来提高这些方法的性能。特别是,我们开发了用于处理基因表达(微阵列)数据集和蛋白质相互作用网络的数据质量问题的预处理方法,旨在增强这些数据集在蛋白质功能预测中的效用。我们还开发了一种方法,用于将基因本体中的本体所捕获的功能类之间的相互关系合并到基于分类的蛋白质功能预测算法中,这使我们能够提高针对多个功能类做出的预测的质量,特别是那些成员蛋白质很少(稀有类别)的蛋白质。最后,我们开发了一种新颖的基于关联分析的双聚类算法,以解决传统双聚类算法的两个主要挑战,即彻底搜索所有满足该算法指定定义的有效双聚类,以及搜索小型双聚类的能力。与传统的双聚类算法产生的那些相比,该算法使得有可能发现更小尺寸的双聚类,这些双聚类中包含特定的GO术语。总体而言,本论文提出的方法有望帮助揭示几种未注释的蛋白质(或基因)的功能,如某些章节中引用的具体实例所示。总而言之,我们还提出了在蛋白质功能预测这一非常重要的问题上进一步取得进展的机会。

著录项

  • 作者

    Pandey, Gaurav.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Biology Molecular.;Computer Science.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 194 p.
  • 总页数 194
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号