首页> 外文学位 >Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost.
【24h】

Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost.

机译:在测试时间资源约束下的有监督的机器学习:准确性和成本之间的权衡。

获取原文
获取原文并翻译 | 示例

摘要

The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off.;We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training.;To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance.;We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets.;To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude.
机译:在过去的十年中,见证了机器学习领域如何将自身确立为几个数十亿美元产业中的必要组成部分。现实世界中的工业环境为机器学习研究提出了一个有趣的新问题:在测试期间,必须对计算资源进行预算,并且必须严格考虑成本。一个典型的问题是,如果一个应用程序在测试期间消耗了x个额外的成本单位,但将准确性提高了y%,是否应该分配额外的x个资源?这个问题的核心是准确性和成本之间的权衡。在本文中,我们研究了测试时间成本的组成部分,并开发了不同的策略来管理这种折衷。我们首先研究测试时间成本,发现它通常由两部分组成:特征提取成本和分类器评估成本。前者反映了将数据实例转换为特征向量的计算工作,并且在特征异构时可能变化很大。后者反映了评估分类器的工作,分类器可能是实质性的,尤其是非参数算法。然后,我们提出了三种在分类器训练期间显式权衡准确性和测试时间成本的两个组成部分的策略。为了预算特征提取成本,我们首先引入两种算法:GreedyMiser和随时时间表示学习(AFR)。 GreedyMiser采用的策略是在分类器训练期间合并提取成本信息,以显着减少测试时间成本。 AFR扩展了GreedyMiser来学习对成本敏感的功能表示而不是分类器,并将传统的支持向量机(SVM)变成了对测试时间成本敏感的随时分类器。对来自两个不同应用程序域的两个真实数据集进行了GreedyMiser和AFR评估,它们均达到了创纪录的性能。然后,我们介绍了成本敏感的分类树(CSTC)和成本敏感的分类器级联(CSCC),它们共享一个共同点。权衡准确性和摊销测试时间成本的策略。 CSTC引入了树结构,并沿着不同的树遍历路径引导测试输入,每个输入针对输入空间的特定子分区进行了优化,提取了不同的专用特征子集。 CSCC扩展了CSTC并构建了一个线性级联(而不是树)来处理类不平衡的二进制分类任务。由于CSTC和CSCC都针对不同的输入提取不同的功能,因此摊销的测试时间成本大大降低,同时保持了较高的准确性。两种方法均优于现实数据集上的当前技术水平。为了权衡非参数分类器的准确性和较高的分类器评估成本,我们提出了一种模型压缩策略并开发了压缩向量机(CVM) 。 CVM专注于非参数内核支持向量机(SVM),当从大型训练集中学习时,其测试时间评估成本通常很高。 CVM是一种后处理算法,通过减少和优化支持向量来压缩学习的SVM模型。在多个基准数据集上,CVM保持了很高的测试准确性,同时将测试时间评估成本降低了几个数量级。

著录项

  • 作者

    Xu, Zhixiang Eddie.;

  • 作者单位

    Washington University in St. Louis.;

  • 授予单位 Washington University in St. Louis.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 115 p.
  • 总页数 115
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:53:25

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号