首页> 外文会议>AAAI Workshop on Human Computation >Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
【24h】

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

机译:用稀疏成对标签挤满:矩阵完成方法

获取原文

摘要

Crowdsourcing utilizes human ability by distributing tasks to a large number of workers. It is especially suitable for solving data clustering problems because it provides a way to obtain a similarity measure between objects based on manual annotations, which capture the human perception of similarity among objects. This is in contrast to most clustering algorithms that face the challenge of finding an appropriate similarity measure for the given dataset. Several algorithms have been developed for crowdclustering that combine partial clustering results, each obtained by annotations provided by a different worker, into a single data partition. However, existing crowdclustering approaches require a large number of annotations, due to the noisy nature of human annotations, leading to a high computational cost in addition to the large cost associated with annotation. We address this problem by developing a novel approach for crowclustering that exploits the technique of matrix completion. Instead of using all the annotations, the proposed algorithm constructs a partially observed similarity matrix based on a subset of pairwise annotation labels that are agreed upon by most annotators. It then deploys the matrix completion algorithm to complete the similarity matrix and obtains the final data partition by applying a spectral clustering algorithm to the completed similarity matrix. We show, both theoretically and empirically, that the proposed approach needs only a small number of manual annotations to obtain an accurate data partition. In effect, we highlight the trade-off between a large number of noisy crowdsourced labels and a small number of high quality labels.
机译:众群利用人类能力,通过将任务分配给大量工人。它特别适用于解决数据聚类问题,因为它提供了基于手动注释在物体之间获得相似性度量的方法,这捕获了对象之间的相似性的人类感知。这与大多数聚类算法相反,面对找到给定数据集的适当相似度量的挑战。已经开发了几种算法,用于众所周心的群体,将部分聚类结果组合,每个群集由不同的工人提供的注释获得到单个数据分区中。然而,由于人类注释的嘈杂性,现有的人群方法需要大量注释,除了与注释相关的大成本之外,还导致高计算成本。我们通过开发一种用于利用矩阵完成技术的众所周知的众议方法来解决这个问题。代替使用所有注释,所提出的算法基于大多数注释器一致的成对注释标签的子集构成部分观察到的相似性矩阵。然后,它部署矩阵完成算法以完成相似性矩阵,并通过将频谱聚类算法应用于已完成的相似性矩阵来获得最终数据分区。我们在理论上和经验上显示,所提出的方法只需要少量的手动注释来获得准确的数据分区。实际上,我们突出了大量嘈杂的众多标签和少量高质量标签之间的权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号