Crowdsourcing the acquisition of natural language corpora: Methods and observations

机译：众包自然语言语料库的获取：方法和观察

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.

机译：我们研究了使用众包方法获取用于自然语言处理系统的语言语料库的机会。具体来说，我们根据经验研究了三种方法，以得出与给定语义形式相对应的自然语言句子。该方法通过句子，场景和基于列表的描述将框架语义传达给人群工作者。我们讨论了众包过程的各种绩效指标，并分析了所收集语言的语义正确性，自然性和偏见。在应用这些方法获取自然语言处理应用的语料库时，我们重点介绍了研究挑战和方向。

著录项

来源
《2012 IEEE Workshop on Spoken Language Technology.》|2012年|p.73-78|共6页
会议地点 Miami FL(US);Miami FL(US)
作者
Wang William Yang; Bohus Dan; Kamar Ece; Horvitz Eric;
展开▼
作者单位

Microsoft Research, Redmond, WA, 98052, U.S.A.;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类语音信号处理;语音信号处理;
关键词
crowdsourcing; language understanding; natural language elicitation methods; spoken dialog;

机译：众包;语言理解;自然语言启发方法;口语对话;;

相似文献

外文文献
中文文献
专利

1. From a Smoking Gun to Spent Fuel: Principled Subsampling Methods for Building Big Language Data Corpora from Monitor Corpora [J] . Jacqueline Hettel Tidwell Data . 2019,第2期

机译：从抽烟的枪到消耗的燃料：从Monitor Corpora构建大语言数据语料库的原则性子采样方法
2. Collaborative Speech Data Acquisition for Under Resourced Languages through Crowdsourcing [J] . Sunita Arora, Karunesh Kumar Arora, Mukund Kumar Roy, Procedia Computer Science . 2016,第1期

机译：通过众包获取资源贫乏语言的协作语音数据
3. On the accuracy of different neural language model approaches to ADE extraction in natural language corpora [J] . Alexander Sboev, Anton Selivanov, Gleb Rylkov, Procedia Computer Science . 2021,第a期

机译：关于不同神经语言模型对自然语言语言模型的准确性
4. Crowdsourcing the acquisition of natural language corpora: Methods and observations [C] . Wang William Yang, Bohus Dan, Kamar Ece, IEEE Workshop on Spoken Language Technology . 2012

机译：众包获取自然语言语料：方法和观察
5. Methods for Improving Natural Language Processing Techniques with Linguistic Regularities Extracted from Large Unlabeled Text Corpora [D] . Lucas, Michael Ryan. 2019

机译：提高了大型未标记文本语料库语言规律的自然语言处理技术的方法
6. Building Gold Standard Corpora for Medical Natural Language Processing Tasks [O] . Louise Deleger, Qi Li, Todd Lingren, 2012

机译：构建用于医学自然语言处理任务的金标准语料库
7. CROWDSOURCING THE ACQUISITION OF NATURAL LANGUAGE CORPORA: METHODS AND OBSERVATIONS [O] . William Yang Wang, Dan Bohus, Ece Kamar, 2013

机译：群众性收购自然语言公司：方法和观察

Crowdsourcing the acquisition of natural language corpora: Methods and observations

摘要

著录项

相似文献

相关主题

期刊订阅