首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection
【24h】

Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

机译:了解众包释义集合中的任务设计权衡

获取原文

摘要

Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts. Crowdsourcing the process of paraphrase generation is an effective means of expanding natural language datasets, but there has been limited analysis of the trade-offs that arise when designing tasks. In this paper, we present the first systematic study of the key factors in crowdsourcing paraphrase collection. We consider variations in instructions, incentives, data domains, and workflows. We manually analyzed paraphrases for correctness, gram-maticality, and linguistic diversity. Our observations provide new insight into the trade-offs between accuracy and diversity in crowd responses that arise as a result of task design, providing guidance for future paraphrase generation procedures.
机译:语言上多样化的数据集对于培训和评估强大的机器学习系统至关重要,但是数据收集是一个昂贵的过程,通常需要专家。众包释义的生成过程是扩展自然语言数据集的有效手段,但是对设计任务时所产生的权衡的分析有限。在本文中,我们提出了对众包意译收集中关键因素的第一个系统研究。我们考虑指令,激励措施,数据域和工作流程的变化。我们手动分析了复述的正确性,语法功能和语言多样性。我们的观察结果提供了新的见解,以了解由于任务设计而导致的人群响应的准确性与多样性之间的取舍,为将来的复述生成程序提供了指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号