首页> 外文OA文献 >Algorithms and Analysis for Multi-Category Classification
【2h】

Algorithms and Analysis for Multi-Category Classification

机译:多类别分类的算法与分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Classification problems in machine learning involve assigning labels to various kinds of output types, from single assignment binary and multi-class classification to more complex assignments such as category ranking, sequence identification, and structured-output classification. Traditionally, most machine learning algorithms and theory is developed for the binary setting. In this dissertation, we provide a framework to unify these problems. Through this framework, many algorithms and significant theoretic understanding developed in the binary domain is extended to more complex settings.First, we introduce Constraint Classification, a learning framework that provides a unified view of complex-output problems. Within this framework, each complex-output label is viewed as a set of constraints, sufficient enough to capture the information needed to classify the example. Thus, prediction in the complex-output setting is reduced to determining which constraints, out of a potentially large set, hold for a given example---a task that can be accomplished by the repeated application of a single binary classifier to indicate whether or not each constraint holds. Using this insight, we provide a principled extension of binary learning algorithms, such as the support vector machine and the Perceptron algorithm to the complex-output domain. We also show that desirable theoretical and experimental properties of the algorithms are maintained in the new setting.Second, we address the structured output problem directly. Structured output labels are collections of variables corresponding to a known structure, such as a tree, graph, or sequence that can bias or constrain the global output assignment. The traditional approach for learning structured output classifiers, that decomposes a structured output into multiple localized labels to learn independently, is theoretically sub-optimal. In contrast, recent methods, such as constraint classification, that learn functions to directly classify the global output can optimal performance. Surprisingly, in practice it is unclear which methods achieve state-of-the-art performance. In this work, we study under what circumstances each method performs best. With enough time, training data, and representative power, the global approaches are better. However, we also show both theoretically and experimentally that learning a suite of local classifiers, even sub-optimal ones, can produce the best results under many real-world settings.Third, we address an important algorithm in machine learning, the maximum margin classifier. Even with a conceptual understanding of how to extend maximum margin algorithms to more complex settings and performance guarantees of large margin classifiers, complex outputs render traditional approaches intractable in more complex settings. We introduce a new algorithm for learning maximum margin classifiers using coresets to find provably approximate solution to maximum margin linear separating hyperplane. Then, using the constraint classification framework, this algorithm applies directly to all of the previously mentioned complex-output domains. In addition, coresets motivate approximate algorithms for active learning and learning in the presence of outlier noise, where we give simple, elegant, and previously unknown proofs of their effectiveness.
机译:机器学习中的分类问题涉及为各种输出类型分配标签,从单项分配二进制和多类分类到更复杂的分配,例如类别排名,序列标识和结构化输出分类。传统上,大多数机器学习算法和理论都是针对二进制设置而开发的。本文提供了一个统一这些问题的框架。通过此框架,在二进制域中开发的许多算法和重要的理论理解被扩展到更复杂的设置。首先,我们引入约束分类,这是一个学习框架,提供了复杂输出问题的统一视图。在此框架内,每个复杂输出标签被视为一组约束,足以捕获分类示例所需的信息。因此,将复杂输出设置中的预测减少为确定对于给定示例而言,从潜在的大集合中确定哪些约束成立-可以通过重复应用单个二进制分类器来指示是否或不执行该任务来完成该任务。并非每个约束都成立。利用这一见解,我们提供了二进制学习算法的原理性扩展,例如支持向量机和Perceptron算法到复杂输出域。我们还证明了算法在新的设置中保持了理想的理论和实验性质。其次,我们直接解决了结构化输出问题。结构化输出标签是与已知结构(例如树,图或序列)相对应的变量的集合,这些变量可以偏向或约束全局输出分配。理论上,用于学习结构化输出分类器的传统方法将结构化输出分解为多个本地化标签以独立学习,这在理论上是次优的。相反,学习约束函数以直接对全局输出进行分类的最新方法(例如约束分类)可以优化性能。令人惊讶的是,在实践中还不清楚哪种方法可以达到最新的性能。在这项工作中,我们研究每种情况下哪种方法效果最佳。如果有足够的时间,训练数据和代表性,那么全局方法会更好。但是,我们还在理论和实验上都表明,学习一套局部分类器(即使是次优分类器)也可以在许多实际设置下产生最佳结果。第三,我们讨论了机器学习中的一个重要算法,即最大余量分类器。即使对如何将最大余量算法扩展到更复杂的设置以及大余量分类器的性能保证有了概念上的理解,复杂的输出仍使传统方法在更复杂的设置中难以处理。我们引入了一种新的算法,用于使用核集学习最大余量分类器,以找到可证明的最大余量线性分离超平面近似解。然后,使用约束分类框架,该算法直接应用于所有前面提到的复杂输出域。此外,核心集激励近似算法进行主动学习,并在存在异常噪声的情况下进行学习,我们在其中给出简单,优雅且以前未知的有效性证明。

著录项

  • 作者

    Zimak Dav A.;

  • 作者单位
  • 年度 2006
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号