An analysis of human factors and label accuracy in crowdsourcing relevance judgments

Gabriella Kazai; Jaap Kamps; Natasa Milic-Frayling

首页> 外文期刊>Information Retrieval >An analysis of human factors and label accuracy in crowdsourcing relevance judgments

【24h】

An analysis of human factors and label accuracy in crowdsourcing relevance judgments

机译：众包相关性判断中的人为因素和标签准确性分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Crowdsourcing relevance judgments for the evaluation of search engines is used increasingly to overcome the issue of scalability that hinders traditional approaches relying on a fixed group of trusted expert judges. However, the benefits of crowdsourcing come with risks due to the engagement of a self-forming group of individuals—the crowd, motivated by different incentives, who complete the tasks with varying levels of attention and success. This increases the need for a careful design of crowdsourcing tasks that attracts the right crowd for the given task and promotes quality work. In this paper, we describe a series of experiments using Amazon’s Mechanical Turk, conducted to explore the ‘human’ characteristics of the crowds involved in a relevance assessment task. In the experiments, we vary the level of pay offered, the effort required to complete a task and the qualifications required of the workers. We observe the effects of these variables on the quality of the resulting relevance labels, measured based on agreement with a gold set, and correlate them with self-reported measures of various human factors. We elicit information from the workers about their motivations, interest and familiarity with the topic, perceived task difficulty, and satisfaction with the offered pay. We investigate how these factors combine with aspects of the task design and how they affect the accuracy of the resulting relevance labels. Based on the analysis of 960 HITs and 2,880 HIT assignments resulting in 19,200 relevance labels, we arrive at insights into the complex interaction of the observed factors and provide practical guidelines to crowdsourcing practitioners. In addition, we highlight challenges in the data analysis that stem from the peculiarity of the crowdsourcing environment where the sample of individuals engaged in specific work conditions are inherently influenced by the conditions themselves.

机译：用于评估搜索引擎的众包相关性判断已越来越多地用于解决可伸缩性问题，该问题阻碍了依赖固定的一组可靠专家法官的传统方法。但是，众包的好处伴随着一个自我形成的个人群体的参与而带来的风险-人群受到不同动机的激励，他们以不同程度的关注和成功来完成任务。这就需要精心设计众包任务，以吸引给定任务的正确人群并促进高质量的工作。在本文中，我们描述了一系列使用Amazon的Mechanical Turk进行的实验，旨在探索参与相关性评估任务的人群的“人性”特征。在实验中，我们改变了提供的工资水平，完成一项任务所需的工作量以及工人的资格要求。我们观察了这些变量对相关标签质量的影响，这些结果标签是根据与黄金集达成的协议进行衡量的，并将它们与各种人为因素的自我报告指标相关联。我们从工人那里得到有关他们的动机，兴趣和对主题的熟悉程度，感知到的任务难度以及对所提供工资的满意度的信息。我们研究了这些因素如何与任务设计的各个方面相结合，以及它们如何影响相关标签的准确性。基于对960个HIT和2880个HIT分配的分析，得出19200个相关标签，我们得出了对观察到的因素的复杂相互作用的见解，并为众包从业人员提供了实用指南。此外，我们强调了数据分析中的挑战，这些挑战源于众包环境的特殊性，在这种环境中，从事特定工作条件的个人样本会受到条件本身的内在影响。

著录项

来源
《Information Retrieval》 |2013年第2期|138-178|共41页
作者
Gabriella Kazai; Jaap Kamps; Natasa Milic-Frayling;
展开▼
作者单位

Microsoft Research">(1);

University of Amsterdam">(2);

Microsoft Research">(1);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Crowdsourcing; Relevance judgments; Study of human factors;

机译：众包;相关性判断;人为因素研究;

相似文献

外文文献
中文文献
专利

1. An analysis of human factors and label accuracy in crowdsourcing relevance judgments [J] . Amit Rudra Computing reviews . 2013,第11期

机译：众包相关性判断中的人为因素和标签准确性分析
2. An analysis of human factors and label accuracy in crowdsourcing relevance judgments [J] . Gabriella Kazai, Jaap Kamps, Natasa Milic-Frayiing Information retrieval . 2013,第2期

机译：众包相关性判断中的人为因素和标签准确性分析
3. Effectiveness evaluation without human relevance judgments: A systematic analysis of existing methods and of their combinations [J] . Kevin Roitero, Andrea Brunello, Giuseppe Serra, Information Processing & Management . 2020,第2期

机译：没有人为相关判断的有效评估：对现有方法和它们的组合进行系统分析
4. The Face of Quality in Crowdsourcing Relevance Labels: Demographics, Personality and Labeling Accuracy [C] . Gabriella Kazai, Jaap Kamps, Natasa Milic-Frayling ACM international conference on information and knowledge management . 2012

机译：众包相关标签的质量面貌：人口统计，个性和标签准确性
5. Meta-analysis of the relationship between mental health professionals' clinical and educational experience and judgment accuracy: Review of clinical judgment research from 1997 to 2010 [D] . Pilipis, Lois A. 2010

机译：荟萃分析心理健康专业人员临床与教育经验与判断准确性的关系：1997至2010年临床判决研究综述
6. A Pancancer Analysis of the Expression Landscape and Clinical Relevance of Fibroblast Growth Factor Receptor 2 in Human Cancers [O] . Juanni Li, Kuan Hu, Jinzhou Huang, 2021

机译：成纤维细胞生长因子受体2在人类癌症中的表达景观和临床相关性的粉突分析
7. Postural and dynamic analysis of the human body: The relevance of the functional modulator factors in the methodological design [O] . Albert Puig-Diví, Anna Prats-Puig 2015

机译：人体姿势和动态分析：功能调节因子在方法设计中的相关性

An analysis of human factors and label accuracy in crowdsourcing relevance judgments

摘要

著录项

相似文献

相关主题

期刊订阅