首页> 外文会议>Mexican international conference on artificial intelligence >Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing
【24h】

Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing

机译:通过筛选众包中的低素质工人来提高标签准确性

获取原文

摘要

Filtering low-quality workers from data sets labeled via crowdsourcing is often necessary due to the presence of low quality workers, who either lack knowledge on corresponding subjects and thus contribute many incorrect labels to the data set, or intentionally label quickly and imprecisely in order to produce more labels in a short time period. We present two new filtering algorithms to remove low-quality workers, called Cluster Filtering (CF) and Dynamic Classification Filtering (DCF). Both methods can use any number of characteristics of workers as attributes for learning. CF separates workers using k-means clustering with 2 centroids, separating the workers into a high-quality cluster and a low-quality cluster. DCF uses a classifier of any kind to perform learning. It builds a model from a set of workers from other crowdsourced data sets and classifies the workers in the data set to filter. In theory, DCF can be trained to remove any proportion of the lowest-quality workers. We compare the performance of DCF with two other filtering algorithms, one by Raykar and Yu (RY), and one by Ipeirotis et al. (IPW). Our results show that CF, the second-best filter, performs modestly but effectively, and that DCF, the best filter, performs much better than RY and IPW on average and on the majority of crowdsourced data sets.
机译:由于存在低质量的工作人员,通常有必要从通过众包标记的数据集中过滤低质量的工作人员,他们要么缺乏对相应主题的知识,从而为数据集贡献了许多不正确的标签,要么有意快速,不精确地标记了数据,以便在短时间内生产更多标签。我们提出了两种新的过滤算法来删除低质量的工作程序,称为簇过滤(CF)和动态分类过滤(DCF)。两种方法都可以使用工人的任意数量的特征作为学习的属性。 CF使用带有2个质心的k均值聚类来分离工作人员,将工作人员分为高质量群集和低质量群集。 DCF使用任何种类的分类器进行学习。它从其他众包数据集中的一组工作人员构建模型,并对数据集中的工作人员进行分类以进行过滤。从理论上讲,可以对DCF进行培训,以消除任何比例的最低质量工人。我们将DCF的性能与其他两种滤波算法进行了比较,一种是Raykar和Yu(RY),另一种是Ipeirotis等人。 (IPW)。我们的结果表明,第二好的过滤器CF的性能适中但有效,而最佳的过滤器DCF的平均性能以及大多数众包数据集的性能均比RY和IPW好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号