首页> 外文学位 >Multiple Kernel Learning for Gene Prioritization, Clustering, and Functional Enrichment Analysis.
【24h】

Multiple Kernel Learning for Gene Prioritization, Clustering, and Functional Enrichment Analysis.

机译:用于基因优先级,聚类和功能丰富分析的多核学习。

获取原文
获取原文并翻译 | 示例

摘要

Gene prioritization is the process of ranking a list of candidate genes such that the genes that are most likely involved in a biological process of interest receive the highest rankings. In a supervised learning approach to gene prioritization, candidate genes are ranked in terms of their degree of similarity to genes that have already been shown to be involved in the process of interest. Gene prioritization thus can be cast as a classification task, in which a training set of genes and data associated with those genes is used to train a classifier to assign rankings to unknown genes, based on their degree of similarity to the training genes. This thesis describes the use of kernel methods, and particularly a method known as multiple kernel learning, for combining information from multiple data sources for purposes of gene prioritization. Multiple kernel learning facilitates the incorporation of heterogeneous data types into the assessment of similarity among genes. In addition, the rows of the kernel matrix can be repurposed as feature vectors. We apply clustering methods to these vectors to partition the gene list into related groups. We then perform functional enrichment analysis on the gene clusters to identify biological functions that are significantly represented in each gene cluster. We thus are able to use a single data structure, namely a kernel matrix representing similarities among genes based on multiple information sources, as the basis for three common types of bioinformatics analysis: gene prioritization, gene clustering, and functional annotation analysis of gene lists. This research contributes to the exploration of methods for extracting useful biological insights from the continually expanding knowledge base of biological data.
机译:基因优先排序是对候选基因列表进行排名的过程,以使最有可能参与目标生物学过程的基因获得最高排名。在对基因优先级进行监督学习的方法中,候选基因根据其与已经显示出参与目标过程的基因的相似程度进行排序。因此,可以将基因优先级划分为分类任务,其中,一组训练的基因和与那些基因相关的数据将用于训练分类器,以基于未知基因与训练基因的相似程度为它们分配排名。本文介绍了使用核方法,尤其是一种称为多核学习的方法,用于组合来自多个数据源的信息以实现基因优先排序。多核学习有助于将异构数据类型整合到基因之间的相似性评估中。此外,内核矩阵的行可以重新用作特征向量。我们将聚类方法应用于这些载体,以将基因列表划分为相关的组。然后,我们对基因簇进行功能富集分析,以确定在每个基因簇中都有明显代表的生物学功能。因此,我们能够使用单个数据结构,即表示基于多个信息源的基因之间相似性的核矩阵,作为三种常见生物信息学分析类型的基础:基因优先级排序,基因聚类和基因列表的功能注释分析。这项研究有助于探索从不断扩展的生物数据知识库中提取有用的生物学见解的方法。

著录项

  • 作者

    Millis, David H.;

  • 作者单位

    George Mason University.;

  • 授予单位 George Mason University.;
  • 学科 Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 128 p.
  • 总页数 128
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号