The Intelligent Management of Crowd-Powered Machine Learning

机译：人群驱动机器学习的智能管理

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Artificial intelligence and machine learning power many technologies today, from spam filters to self-driving cars to medical decision assistants. While this revolution has hugely benefited from algorithmic developments, it also could not have occurred without data, which nowadays is frequently procured at massive scale from crowds. Because data is so crucial, a key next step towards truly autonomous agents is the design of better methods for intelligently managing now-ubiquitous crowd-powered data-gathering processes. This dissertation takes this key next step by developing algorithms for the online and dynamic control of these processes. We consider how to gather data for its two primary purposes: training and evaluation.;In the first part of the dissertation, we develop algorithms for obtaining data for testing. The most important requirement of testing data is that it must be extremely clean. Thus to deal with noisy human annotations, machine learning practitioners typically rely on careful workflow design and advanced statistical techniques for label aggregation. A common process involves designing and testing multiple crowdsourcing workflows for their tasks, identifying the single best-performing workflow, and then aggregating worker responses from redundant runs of that single workflow. We improve upon this process by building two control models: one that allows for switching between many workflows depending on how well a particular workflow is performing for a given example and worker; and one that can aggregate labels from tasks that do not have a finite predefined set of multiple choice answers (e.g., counting tasks). We then implement agents that use our new models to dynamically choose whether to acquire more labels from the crowd or stop, and show that they can produce higher quality labels at a cheaper cost than state-of-the-art baselines.;In the second part of the dissertation, we shift to tackle the second purpose of data: training. Because learning algorithms are often robust to noise, training sets do not necessarily have to be clean and have more complex requirements. We first investigate a tradeoff between size and noise. We survey how inductive bias, worker accuracy, and budget affect whether a larger and noisier training set or a smaller and cleaner one will train better classifiers. We then set up a formal framework for dynamically choosing the next example to label or relabel by generalizing active learning to allow for relabeling, which we call re-active learning, and we design new algorithms for re-active learning that outperform active learning baselines. Finally, we leave the noisy setting and investigate how to collect balanced training sets in domains of varying skew, by considering a setting in which workers can not only label examples, but also generate examples with various distributions. We design algorithms that can intelligently switch between deploying these various worker tasks depending on the skew in the dataset, and show that our algorithms can result in significantly better performance than state-of-the-art baselines.

机译：如今，人工智能和机器学习推动了许多技术的发展，从垃圾邮件过滤器到自动驾驶汽车再到医疗决策助手。尽管这项革命从算法的发展中受益匪浅，但如果没有数据，也就不可能发生革命，而如今，数据却经常从人群中大规模购买。由于数据至关重要，因此要实现真正的自治代理，关键的下一步就是设计更好的方法，以智能地管理现在无处不在的人群驱动的数据收集流程。本文通过开发用于这些过程的在线和动态控制的算法，采取了下一步的关键步骤。我们考虑如何为训练和评估这两个主要目的收集数据。在论文的第一部分，我们开发了用于获取数据以进行测试的算法。测试数据最重要的要求是它必须非常干净。因此，为了处理嘈杂的人类注释，机器学习从业人员通常依靠精心的工作流程设计和先进的统计技术来进行标签聚合。一个常见的过程涉及为其任务设计和测试多个众包工作流，确定单个表现最佳的工作流，然后从该单个工作流的冗余运行中汇总工作人员的响应。我们通过建立两个控制模型来改进此过程：一个控制模型允许在多个工作流之间切换，具体取决于特定工作流对给定示例和工作人员的执行情况。以及可以从没有预定义的有限选择题集的任务中聚合标签的方法（例如，计数任务）。然后，我们实施使用新模型的代理商来动态选择是从人群中获得更多标签还是停下来，并表明他们可以以比最新基准更低的成本生产更高质量的标签。在论文的一部分中，我们转向解决数据的第二个目的：培训。由于学习算法通常对噪声具有鲁棒性，因此训练集不一定必须是干净的且具有更复杂的要求。我们首先研究尺寸和噪声之间的权衡。我们调查归纳偏见，工人准确性和预算如何影响更大，更嘈杂的训练集或更小，更干净的训练集将训练更好的分类器。然后，我们建立了一个正式的框架，通过泛化主动学习以允许重新标记来动态选择下一个要标记或重新标记的示例，我们将其称为“主动学习”，并且我们设计了优于主动学习基准的新的主动学习算法。最后，我们离开嘈杂的环境，研究如何通过考虑工人不仅可以标记示例，还可以生成具有各种分布的示例的环境，在不同的偏斜域中收集平衡的训练集。我们设计的算法可以根据数据集中的偏斜度在部署这些各种工作程序任务之间进行智能切换，并证明我们的算法与最新的基准相比可以显着提高性能。

著录项

作者
Lin, Christopher H.;
展开▼
作者单位

University of Washington.;

展开▼
授予单位 University of Washington.;
学科 Computer science.;Artificial intelligence.
学位 Ph.D.
年度 2017
页码 175 p.
总页数 175
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:54:26

相似文献

外文文献
中文文献
专利

1. Intelligent Choice of Machine Learning Methods for Predictive Maintenance of Intelligent Machines [J] . Becherer Marius, Zipperle Michael, Karduck Achim International Journal of Computer Systems Science & Engineering . 2020,第2期

机译：智能机床预测维护机床学习方法的智能选择
2. Machine learning-based regional scale intelligent modeling of building information for natural hazard risk management [J] . Wang Chaofeng, Yu Qian, Law Kincho H., Automation in construction . 2021,第Feba期

机译：基于机器学习的区域规模智能建筑信息自然危险风险管理信息
3. Intelligent Electromagnetic Compatibility Diagnosis and Management With Collective Knowledge Graphs and Machine Learning [J] . Shi Dan, Wang Nan, Zhang Fangfei, IEEE Transactions on Electromagnetic Compatibility . 2021,第2期

机译：集体知识图和机器学习的智能电磁兼容性诊断和管理
4. Alarm Ranking Model for Intelligent Management of Metro Systems Based on Statistical Machine Learning Methods [C] . Jiawei Xu, Shirong Zhou, Yincai Tang, Conference on Global Reliability and Prognostics and Health Management . 2020

机译：基于统计机器学习方法的地铁系统智能管理报警排名模型
5. Intelligent and Machine Learning-Based Approaches for Congestion Management and Cascading Failure and Blackout Prevention in Smart Grids. [D] . Zarrabian, Sina. 2017

机译：基于智能和机器学习的方法，用于智能电网中的拥塞管理，连锁故障和停电预防。
6. Machine Learning and Intelligent Diagnostics in Dental and Orofacial Pain Management: A Systematic Review [O] . Taseef Hasan Farook, Nafij Bin Jamayet, Johari Yap Abdullah, 2021

机译：机器学习和智能诊断牙科疼痛管理中的智能诊断：系统评价
7. Machine learning empowered beam management for intelligent reflecting surface assisted MmWave networks [O] . Chenglu Jia, Hui Gao, Na Chen, 2020

机译：机器学习智能反射表面辅助MMWAVE网络的光束管理
8. Intelligent Vehicle Power Management Using Machine Learning and Fuzzy Logic [R] . Chen, Z., Masrur, M. A., Murphey, Y. L. 2008

机译：基于机器学习和模糊逻辑的智能车辆电源管理

The Intelligent Management of Crowd-Powered Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅