首页> 外文学位 >An Empirical Evaluation of Active Learning and Selective Sampling Variations Supporting Large Corpus Labeling.

【24h】

An Empirical Evaluation of Active Learning and Selective Sampling Variations Supporting Large Corpus Labeling.

机译：支持大型语料库标签的主动学习和选择性抽样变异的实证评估。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A constant challenge to researchers is the lack of large and timely datasets of domain examples (research corpora) used for training and testing their latest algorithms. Corpora examples are often annotated with special labels that represent class categories, numeric predictions, etc., depending on the research problem. While acquiring large numbers of examples is often not difficult, ensuring that each is correctly and consistently labeled certainly can be. Human experts may be required to visually inspect, annotate, and cross-check each example to guarantee its accuracy. Unfortunately, the costs incurred performing this adjudication have lead to a shortage of labeled corpora, particularly bigger and more recent ones. The primary goal of our research has been to determine how larger volumes of examples could be autonomously annotated to create more substantial datasets using a minimum of human intervention while maintaining acceptable levels of labeling accuracy.;We chose a form of Machine Learning, Active Learning, as the basis for building a suite of automated corpus labeling tools. Our labelers start with a few pre-labeled examples and a larger number of unlabeled examples. They then iteratively select small batches of these examples for labeling by an "oracle", which may be a live human expert or some other authoritative source. This "selective sampling" step picks those queries which the tools themselves think would enhance their future labeling predictions. Once the labelers have been trained, the learning iterations cease and the rest of the unlabeled examples in a corpus can be confidently labeled without additional human intervention.;To sample the most informative queries we began with the well-known Uncertainty Sampling (US) technique. However, US can be computationally expensive, and so we have proposed a new variant, Approximate Uncertainty Sampling (AUS), that is nearly as effective, but which has lower complexity costs and much less processing overhead. These reductions allow AUS to select queries more frequently and support other types of computation during labeling. In this way AUS encourages the building of larger and more topical corpora for the research communities that require them.

机译：研究人员经常面临的挑战是缺少用于训练和测试其最新算法的领域实例（研究语料库）的大型及时数据库。根据研究问题，语料库示例通常带有表示类类别，数字预测等的特殊标签。虽然获取大量示例通常并不困难，但是确保每个示例正确正确地标记无疑是可以的。可能需要人类专家对每个示例进行视觉检查，注释和交叉检查，以确保其准确性。不幸的是，执行该裁决所产生的成本导致了标记语料库的短缺，特别是规模较大和较新的语料库。我们研究的主要目标是确定如何使用最少的人工干预就能自动注释更大数量的示例，以创建更大量的数据集，同时保持可接受的标签准确性。我们选择了一种形式的机器学习，主动学习，作为构建一套自动化语料库标记工具的基础。我们的贴标机以一些预先标记的示例和大量未标记的示例开始。然后，他们反复选择这些示例的小批量，以供“甲骨文”标记，这些甲骨文可能是真人专家或其他权威人士。此“选择性采样”步骤选择工具本身认为会增强其未来标签预测的那些查询。一旦对标签人员进行了培训，学习迭代就会停止，并且无需额外的人工干预就可以自信地对语料库中其余未标签的示例进行标签。；为了对信息量最大的问题进行抽样，我们从众所周知的不确定性抽样（US）技术入手。。但是，US在计算上可能会很昂贵，因此我们提出了一种新的近似不确定性采样（AUS），它几乎一样有效，但是具有较低的复杂性成本和更少的处理开销。这些减少使AUS可以更频繁地选择查询，并在标记期间支持其他类型的计算。通过这种方式，AUS鼓励为需要他们的研究社区建立更大，更主题化的语料库。

著录项

作者
Markowitz, Theodore J.;
展开▼
作者单位

Pace University.;

展开▼
授予单位 Pace University.;
学科 Computer Science.
学位 D.P.S.
年度 2011
页码 182 p.
总页数 182
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Supporting active learning and formative evaluation via teaching-by-questioning in classrooms: design, development, and preliminary evaluation of an online learning system [J] . Liu Yu-Hsin, Yu Fu-Yun Interactive Learning Environments . 2019,第5a8期

机译：通过在课堂中进行逐题教学来支持主动学习和形成性评估：在线学习系统的设计，开发和初步评估
2. Selective Sampling and Active Learning from Single and Multiple Teachers [J] . Dekel Ofer, Gentile Claudio, Sridharan Karthik Journal of machine learning research . 2012,第Sep期

机译：单人和多人教师的选择性抽样和主动学习
3. Empirical variation in 24-h profiles of delivered power for a sample of UK dwellings: Implications for evaluating energy savings [J] . A.J. Summerfield, T. Oreszczyn, I.G. Hamilton, Energy and Buildings . 2015,第feba期

机译：英国住宅样品的24小时供电曲线的经验变化：评估节能的意义
4. An empirical study on selective sampling in active learning for splog detection [C] . Taichi Katayama, Takehito Utsuro, Yuuki Sato, 5th international workshop on adversarial information retrieval on the web 2009 . 2009

机译：主动学习中选择性采样的实验研究
5. A Bayesian decision theoretical approach to supervised learning, selective sampling, and empirical function optimization. [D] . Carroll, James L. 2010

机译：用于监督学习，选择性抽样和经验函数优化的贝叶斯决策理论方法。
6. Variations in Team-Based Learning Methodology Call for Active Scholarship in Support of a Gold Standard [O] . Mirela Feurdean, Daniel Matassa, Genevieve Streb, 2017

机译：基于团队的学习方法的变化要求积极的奖学金来支持金本位制
7. An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation [O] . Manabu Sassano 2002

机译：支持向量机主动学习日语分词的实证研究

An Empirical Evaluation of Active Learning and Selective Sampling Variations Supporting Large Corpus Labeling.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅