首页> 外文学位 >Exploring protein functional relationships utilizing genomic information, data mining and computational intelligence.
【24h】

Exploring protein functional relationships utilizing genomic information, data mining and computational intelligence.

机译:利用基因组信息,数据挖掘和计算智能探索蛋白质功能关系。

获取原文
获取原文并翻译 | 示例

摘要

Since the discoveries of Overhauser effects and DNA double helix structure, many protein structures have been determined experimentally, especially by utilizing the Overhauser effects. Biologists are not only able to describe the life phenomena but also to seek the understanding of life mechanisms at molecular level. With the advent of high-throughput genome sequencing technology, more and more genomes are available; consequently our ability to sequence genomes has outstripped our ability to analyze the resulting data in order to determine the functions and structures of proteins encoded in the genomes. Determination of protein structures and functions using traditional laboratory methods is rather slow and expensive. Therefore, our goal is to develop an automated machine learning based approach to provide information concerning multiple functional relations among a large group of proteins simultaneously through computational intelligence.; As of today, functions of most proteins are either completely unknown or not completely known. This is due to the nature of complex protein-protein and protein-DNA interactions and the limitations of experimental approaches and data mining techniques. However, we are able to extract information concerning the protein functional relationship by our new approach which performed a hierarchical decomposition of feature space. Such approach transformed the difficult problem into simpler sub-problems so that complex biomedical data can be utilized efficiently in solving the problems. We refer this new approach as unsupervised and supervised tree (UST) because it combined the advantages of both supervised and unsupervised learning. The core of UST is to construct a Maximum contract tree (MCT) that allows us to establish many links among proteins of related functions.; Furthermore, we introduced a new machine learning classifier called Multiple-Labeled Instance Classifier (MLIC) that handles instances belonging to many classes, which has not been studied in previous computational intelligence approaches.; We built a most comprehensive protein phylogenetic profile library based on 60 genomes; it is an improvement from the results of other protein phylogenetic profiles based on 24 genomes. Experimental results show USTs outperform other computational intelligence methods such as Support Vector Machines and Decision Trees, and provide a viable alternative to the supervised or unsupervised methods alone.
机译:自从发现Overhauser效应和DNA双螺旋结构以来,已经通过实验确定了许多蛋白质结构,尤其是通过利用Overhauser效应。生物学家不仅能够描述生命现象,而且能够在分子水平上寻求对生命机制的理解。随着高通量基因组测序技术的出现,越来越多的基因组可用。因此,我们对基因组进行测序的能力已经超过了我们分析所得数据以确定基因组中编码的蛋白质的功能和结构的能力。使用传统的实验室方法确定蛋白质的结构和功能相当缓慢且昂贵。因此,我们的目标是开发一种基于机器学习的自动化方法,以通过计算智能同时提供有关大量蛋白质之间的多种功能关系的信息。到目前为止,大多数蛋白质的功能要么完全未知,要么完全未知。这是由于复杂的蛋白质-蛋白质和蛋白质-DNA相互作用的性质以及实验方法和数据挖掘技术的局限性所致。但是,我们能够通过执行特征空间分层分解的新方法来提取有关蛋白质功能关系的信息。这种方法将困难的问题转化为更简单的子问题,从而可以有效地利用复杂的生物医学数据来解决问题。我们将此新方法称为无监督和监督树(UST),因为它结合了有监督和无监督学习的优点。 UST的核心是构建最大契约树(MCT),该树使我们能够在相关功能蛋白之间建立许多联系。此外,我们引入了一种新的机器学习分类器,称为多标签实例分类器(MLIC),该分类器可处理属于许多类的实例,以前的计算智能方法尚未对此进行研究。我们基于60个基因组建立了最全面的蛋白质系统发育谱库。它是基于24个基因组的其他蛋白质系统发育谱结果的改进。实验结果表明,UST的性能优于其他计算智能方法,例如支持向量机和决策树,并为仅受监督或不受监督的方法提供了可行的替代方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号