首页> 外文学位 >Learning with multiple kernels: Semidefinite programming, duality, efficient optimization and applications in computational biology.
【24h】

Learning with multiple kernels: Semidefinite programming, duality, efficient optimization and applications in computational biology.

机译:用多个内核学习:半定程序设计,对偶性,高效优化和在计算生物学中的应用。

获取原文
获取原文并翻译 | 示例

摘要

An important challenge for the field of machine learning is to leverage the diversity of information available in large-scale learning problems, in which different sources of information often capture different aspects of the data. Beyond classical vectorial data formats, information in the format of graphs, trees, strings and beyond have become widely available. For example, in computational biology many such sources of information about genes and proteins are now available: sequence, expression, protein and regulation information. More data types are going to be available in the near future, such as array-based fitness profiles and protein-protein interaction data from mass spectrometry.;Recent work in computational biology (such as gene function prediction; prediction of protein structure and localization, and inference of regulatory and metabolic networks) could benefit significantly from an approach that treats in a unified way the different types of information, merging them into a single representation, rather than only using the description that is judged to be the most relevant at hand.;In this thesis, a principled computational and statistical framework to integrate data from heterogeneous information sources in a flexible and unified way is introduced. The approach is formulated within the unifying learning framework of kernel methods and applied to the specific case of classification. Each data set is represented via a kernel function, which defines a generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and provides a principled framework in which many types of data can be represented, including vectors, strings, trees and graphs.;The resulting formulation takes the form of a semidefinite programming (SDP) problem. Although this implies a polynomial time algorithm; the scale of many real-life problems is often beyond the reach of general-purpose SDP algorithms. Using tools from conic duality and convex analysis, a dedicated algorithm is derived that is significantly more efficient than generic SDP methods in this setting.;Finally, applications to computational biology are presented, showing that classification performance can be enhanced by integrating diverse genome-wide information sources.
机译:机器学习领域的一个重要挑战是利用大规模学习问题中可用的信息多样性,在这种情况下,不同的信息源通常会捕获数据的不同方面。除了经典的矢量数据格式外,图形,树,字符串等格式的信息已经广泛可用。例如,在计算生物学中,有关基因和蛋白质的许多此类信息源现已可用:序列,表达,蛋白质和调控信息。不久的将来将有更多数据类型可用,例如基于阵列的适应度概况和质谱分析中的蛋白质-蛋白质相互作用数据。;计算生物学的最新工作(例如基因功能预测;蛋白质结构和定位的预测;以及监管网络和新陈代谢网络的推论)可以从一种以统一方式处理不同类型信息,将它们合并为单个表示的方法中受益,而不仅仅是使用被认为是最相关的描述。本文提出了一种有原则的计算和统计框架,以灵活统一的方式集成来自异构信息源的数据。该方法是在内核方法的统一学习框架内制定的,并应用于分类的特定情况。每个数据集均通过核函数表示,该函数定义了成对的实体(例如基因或蛋白质)之间的广义相似关系。内核表示既灵活又高效,并且提供了一个原则性的框架,可以在其中表示多种类型的数据,包括向量,字符串,树和图形。结果表示形式为半定程序(SDP)问题。尽管这意味着多项式时间算法;许多现实生活中的问题的规模通常超出了通用SDP算法的范围。使用圆锥对偶性和凸分析的工具,可以得出一种专用算法,在这种情况下,它比通用SDP方法的效率显着提高。最后,提出了计算生物学的应用程序,表明可以通过整合不同基因组范围来提高分类性能信息来源。

著录项

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Engineering Electronics and Electrical.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 116 p.
  • 总页数 116
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:41:57

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号