首页> 外文会议>Annual conference of the International Speech Communication Association;INTERSPEECH 2010 >Creating a Linguistic Plausibility Dataset with Non-Expert Annotators
【24h】

Creating a Linguistic Plausibility Dataset with Non-Expert Annotators

机译:使用非专家注释器创建语言可行性数据集

获取原文

摘要

We describe the creation of a linguistic plausibility dataset that contains annotated examples of language judged to be linguistically plausible, implausible, and every-thing in between. To create the dataset we randomly generate sentences and have them annotated by crowd sourcing over the Amazon Mechanical Turk. Obtaining inter-annotator agreement is a difficult problem because linguistic plausibility is highly subjective. The annotations obtained depend, among other factors, on the manner in which annotators are questioned about the plausibility of sentences. We describe our experiments on posing a number of different questions to the annotators, in order to elicit the responses with greatest agreement, and present several methods for analyzing the resulting responses. The generated dataset and annotations are being made available to public.
机译:我们描述了语言合理性数据集的创建,该数据集包含被注释为语言合理性,不可信性以及介于两者之间的所有内容的带注释的示例。为了创建数据集,我们随机生成句子,并通过在Amazon Mechanical Turk上的众包进行注释。获得注释者之间的协议是一个困难的问题,因为语言的真实性是非常主观的。除其他因素外,获得的注释取决于对注释者质疑句子是否合理的方式。我们描述了对注释者提出许多不同问题的实验,以便以最大的一致性得出响应,并提出了几种分析结果响应的方法。生成的数据集和注释将对公众开放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号