首页> 外文会议>IEEE International Conference on Data Science and Advanced Analytics >Predicting Worker Disagreement for More Effective Crowd Labeling
【24h】

Predicting Worker Disagreement for More Effective Crowd Labeling

机译:预测工人意见分歧,以更有效地进行人群标记

获取原文

摘要

Crowdsourcing is a popular mechanism used for labeling tasks to produce large corpora for training. However, producing a reliable crowd labeled training corpus is challenging and resource consuming. Research on crowdsourcing has shown that label quality is much affected by worker engagement and expertise. In this study, we postulate that label quality can also be affected by inherent ambiguity of the documents to be labeled. Such ambiguities are not known in advance, of course, but, once encountered by the workers, they lead to disagreement in the labeling - a disagreement that cannot be resolved by employing more workers. To deal with this problem, we propose a crowd labeling framework: we train a disagreement predictor on a small seed of documents, and then use this predictor to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. We report on the findings of the experiments we conducted on crowdsourcing a Twitter corpus for sentiment classification.
机译:众包是一种流行的机制,用于标记任务以产生大型语料库进行培训。但是,要创建一个可靠的,带有人群标签的训练语料库是一项挑战,而且会消耗资源。对众包的研究表明,标签质量受工人敬业度和专业知识的影响很大。在这项研究中,我们假设标签的质量也可能受到要标记的文档固有的歧义性的影响。这样的歧义当然是事先未知的,但是一旦工人遇到,就会导致标签上的分歧-这种分歧无法通过雇用更多的工人来解决。为了解决这个问题,我们提出了一个人群标签框架:我们在少量文档上训练不一致预测器,然后使用该预测器来决定应标注整个语料库的哪些文件,以及应该检查哪些文件固有在分配(并可能浪费)工作人员精力上的歧义之前。我们报告了通过众包Twitter语料库进行情感分类而进行的实验的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号