Predicting Worker Disagreement for More Effective Crowd Labeling

机译：预测工人意见分歧，以更有效地进行人群标记

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Crowdsourcing is a popular mechanism used for labeling tasks to produce large corpora for training. However, producing a reliable crowd labeled training corpus is challenging and resource consuming. Research on crowdsourcing has shown that label quality is much affected by worker engagement and expertise. In this study, we postulate that label quality can also be affected by inherent ambiguity of the documents to be labeled. Such ambiguities are not known in advance, of course, but, once encountered by the workers, they lead to disagreement in the labeling - a disagreement that cannot be resolved by employing more workers. To deal with this problem, we propose a crowd labeling framework: we train a disagreement predictor on a small seed of documents, and then use this predictor to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. We report on the findings of the experiments we conducted on crowdsourcing a Twitter corpus for sentiment classification.

机译：众包是一种流行的机制，用于标记任务以产生大型语料库进行培训。但是，要创建一个可靠的，带有人群标签的训练语料库是一项挑战，而且会消耗资源。对众包的研究表明，标签质量受工人敬业度和专业知识的影响很大。在这项研究中，我们假设标签的质量也可能受到要标记的文档固有的歧义性的影响。这样的歧义当然是事先未知的，但是一旦工人遇到，就会导致标签上的分歧-这种分歧无法通过雇用更多的工人来解决。为了解决这个问题，我们提出了一个人群标签框架：我们在少量文档上训练不一致预测器，然后使用该预测器来决定应标注整个语料库的哪些文件，以及应该检查哪些文件固有在分配（并可能浪费）工作人员精力上的歧义之前。我们报告了通过众包Twitter语料库进行情感分类而进行的实验的结果。

著录项

来源
《IEEE International Conference on Data Science and Advanced Analytics》|2018年|179-188|共10页
会议地点
作者
Stefan Räbiger; Gizem Gezici; Yücel Saygın; Myra Spiliopoulou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Crowdsourcing; Task analysis; Reliability; Silicon; Labeling; Training; Prediction algorithms;

机译：众包任务分析可靠性硅标签培训预测算法;

相似文献

外文文献
中文文献
专利

1. Agreement/disagreement based crowd labeling [J] . Hossein Amirkhani, Mohammad Rahmati Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2014,第1期

机译：基于协议/分歧的人群标签
2. Cost-effective quality assurance in crowd labeling [J] . Jing Wang, Panagiotis G. Ipeirotis, Foster Provost Operations Research . 2018,第4期

机译：人群贴标中具有成本效益的质量保证
3. Cost-Effective Quality Assurance in Crowd Labeling [J] . Wang Jing, Ipeirotis Panagiotis G., Provost Foster Information Systems Research . 2017,第1期

机译：人群标签中具有成本效益的质量保证
4. Predicting Worker Disagreement for More Effective Crowd Labeling [C] . Stefan R?biger, Gizem Gezici, Yücel Sayg?n, IEEE International Conference on Data Science and Advanced Analytics . 2019

机译：预测工人对更有效的人群标签的分歧
5. Toward a Robust and Universal Crowd Labeling Framework. [D] . Khattak, Faiza Khan. 2017

机译：迈向强大而通用的人群标签框架。
6. Aggregating and Predicting Sequence Labels from Crowd Annotations [O] . An T. Nguyen, Byron C. Wallace, Junyi Jessy Li, -1

机译：从人群注释聚合和预测序列标签
7. On aggregating labels from multiple crowd workers to infer relevance of documents [O] . Mehdi Hosseini, Ingemar J. Cox, Nataša Milić-frayling, 2012

机译：汇总来自多个群体工作者的标签以推断文档的相关性

Predicting Worker Disagreement for More Effective Crowd Labeling

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅