首页> 外文学位 >Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost.

【24h】

Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost.

机译：在测试时间资源约束下的有监督的机器学习：准确性和成本之间的权衡。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off.;We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training.;To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance.;We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets.;To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude.

机译：在过去的十年中，见证了机器学习领域如何将自身确立为几个数十亿美元产业中的必要组成部分。现实世界中的工业环境为机器学习研究提出了一个有趣的新问题：在测试期间，必须对计算资源进行预算，并且必须严格考虑成本。一个典型的问题是，如果一个应用程序在测试期间消耗了x个额外的成本单位，但将准确性提高了y％，是否应该分配额外的x个资源？这个问题的核心是准确性和成本之间的权衡。在本文中，我们研究了测试时间成本的组成部分，并开发了不同的策略来管理这种折衷。我们首先研究测试时间成本，发现它通常由两部分组成：特征提取成本和分类器评估成本。前者反映了将数据实例转换为特征向量的计算工作，并且在特征异构时可能变化很大。后者反映了评估分类器的工作，分类器可能是实质性的，尤其是非参数算法。然后，我们提出了三种在分类器训练期间显式权衡准确性和测试时间成本的两个组成部分的策略。为了预算特征提取成本，我们首先引入两种算法：GreedyMiser和随时时间表示学习（AFR）。 GreedyMiser采用的策略是在分类器训练期间合并提取成本信息，以显着减少测试时间成本。 AFR扩展了GreedyMiser来学习对成本敏感的功能表示而不是分类器，并将传统的支持向量机（SVM）变成了对测试时间成本敏感的随时分类器。对来自两个不同应用程序域的两个真实数据集进行了GreedyMiser和AFR评估，它们均达到了创纪录的性能。然后，我们介绍了成本敏感的分类树（CSTC）和成本敏感的分类器级联（CSCC），它们共享一个共同点。权衡准确性和摊销测试时间成本的策略。 CSTC引入了树结构，并沿着不同的树遍历路径引导测试输入，每个输入针对输入空间的特定子分区进行了优化，提取了不同的专用特征子集。 CSCC扩展了CSTC并构建了一个线性级联（而不是树）来处理类不平衡的二进制分类任务。由于CSTC和CSCC都针对不同的输入提取不同的功能，因此摊销的测试时间成本大大降低，同时保持了较高的准确性。两种方法均优于现实数据集上的当前技术水平。为了权衡非参数分类器的准确性和较高的分类器评估成本，我们提出了一种模型压缩策略并开发了压缩向量机（CVM）。 CVM专注于非参数内核支持向量机（SVM），当从大型训练集中学习时，其测试时间评估成本通常很高。 CVM是一种后处理算法，通过减少和优化支持向量来压缩学习的SVM模型。在多个基准数据集上，CVM保持了很高的测试准确性，同时将测试时间评估成本降低了几个数量级。

著录项

作者
Xu, Zhixiang Eddie.;
展开▼
作者单位

Washington University in St. Louis.;

展开▼
授予单位 Washington University in St. Louis.;
学科 Computer Science.
学位 Ph.D.
年度 2014
页码 115 p.
总页数 115
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:53:25

相似文献

外文文献
中文文献
专利

1. Deep In-Memory Architectures for Machine Learning–Accuracy Versus Efficiency Trade-Offs [J] . Kang Mingu, Kim Yongjune, Patil Ameya D., Circuits and Systems I: Regular Papers, IEEE Transactions on . 2020,第5期

机译：用于机器学习的深内记忆架构 - 准确性与效率折磨
2. Towards a Better Trade-Off Between Sensor Accuracy and Comfort in Smart Clothing Design: A Machine Learning Approach [J] . Wei Ding, Jing Liu, Yanpeng Li Journal of computational and theoretical nanoscience . 2014,第2期

机译：在智能服装设计中的传感器精度和舒适性之间寻求更好的折衷：一种机器学习方法
3. Supervised Machine Learning Classifiers: Computation of Best Result of Classification Accuracy [J] . Himanshu Thakur, Aman Kumar Sharma International Journal of Computer Trends and Technology . 2020,第10期

机译：监督机器学习分类器：计算分类准确性的最佳结果
4. An Analysis of Computational Complexity and Accuracy of Two Supervised Machine Learning Algorithms--K-Nearest Neighbor and Support Vector Machine [C] . Susmita Ray International Conference on Data Management, Analytics and Innovation . 2021

机译：两个监督机器学习算法的计算复杂性和准确性分析 - K最近邻和支持向量机
5. Semi-Supervised Machine Learning Techniques for Classification of Evolving Data in Pattern Recognition =TECHNIQUES SEMI-SUPERVISéES D'APPRENTISSAGE MACHINE POUR LA CLASSIFICATION DES DONNéES EN éVOLUTION EN RECONNAISSANCE DE FORMES [D] . Tencer, Lukas. 2017

机译：半监督机器学习技术，用于模式识别中不断发展的数据分类=在表单识别中对数据进行分类的半监督机器学习技术
6. Resource Usage and Performance Trade-offs for Machine Learning Models in Smart Environments [O] . Davy Preuveneers, Ilias Tsingenopoulos, Wouter Joosen 2020

机译：智能环境中机器学习模型的资源使用情况和性能折衷
7. Accuracy vs. Cost Trade-off for Machine Learning Based QoE Estimation in 5G Networks [O] . Susanna Schwarzmann, Clarissa Cassales Marquezan, Riccardo Trivisonno, 2020

机译：基于机器学习的QoE估计的精确度与成本折衷于5G网络中的QoE估算

Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost.

摘要

著录项

相似文献

相关主题

期刊订阅