Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

机译：用稀疏成对标签挤满：矩阵完成方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Crowdsourcing utilizes human ability by distributing tasks to a large number of workers. It is especially suitable for solving data clustering problems because it provides a way to obtain a similarity measure between objects based on manual annotations, which capture the human perception of similarity among objects. This is in contrast to most clustering algorithms that face the challenge of finding an appropriate similarity measure for the given dataset. Several algorithms have been developed for crowdclustering that combine partial clustering results, each obtained by annotations provided by a different worker, into a single data partition. However, existing crowdclustering approaches require a large number of annotations, due to the noisy nature of human annotations, leading to a high computational cost in addition to the large cost associated with annotation. We address this problem by developing a novel approach for crowclustering that exploits the technique of matrix completion. Instead of using all the annotations, the proposed algorithm constructs a partially observed similarity matrix based on a subset of pairwise annotation labels that are agreed upon by most annotators. It then deploys the matrix completion algorithm to complete the similarity matrix and obtains the final data partition by applying a spectral clustering algorithm to the completed similarity matrix. We show, both theoretically and empirically, that the proposed approach needs only a small number of manual annotations to obtain an accurate data partition. In effect, we highlight the trade-off between a large number of noisy crowdsourced labels and a small number of high quality labels.

机译：众群利用人类能力，通过将任务分配给大量工人。它特别适用于解决数据聚类问题，因为它提供了基于手动注释在物体之间获得相似性度量的方法，这捕获了对象之间的相似性的人类感知。这与大多数聚类算法相反，面对找到给定数据集的适当相似度量的挑战。已经开发了几种算法，用于众所周心的群体，将部分聚类结果组合，每个群集由不同的工人提供的注释获得到单个数据分区中。然而，由于人类注释的嘈杂性，现有的人群方法需要大量注释，除了与注释相关的大成本之外，还导致高计算成本。我们通过开发一种用于利用矩阵完成技术的众所周知的众议方法来解决这个问题。代替使用所有注释，所提出的算法基于大多数注释器一致的成对注释标签的子集构成部分观察到的相似性矩阵。然后，它部署矩阵完成算法以完成相似性矩阵，并通过将频谱聚类算法应用于已完成的相似性矩阵来获得最终数据分区。我们在理论上和经验上显示，所提出的方法只需要少量的手动注释来获得准确的数据分区。实际上，我们突出了大量嘈杂的众多标签和少量高质量标签之间的权衡。

著录项

来源
《AAAI Workshop on Human Computation》|2012年||共7页
会议地点
作者
Jinfeng Yi; Rong Jint; Anil K. Jain; Shaili Jain;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 73.966083;
关键词
Crowdclustering; Sparse; Approach;

机译：众人;稀疏;方法;

相似文献

外文文献
中文文献
专利

1. A transversal approach for patch-based label fusion via matrix completion [J] . Sanroma Gerard, Wu Guorong, Gao Yaozong, Medical image analysis . 2015,第1期

机译：通过矩阵完成的基于补丁的标签融合的横向方法
2. Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers [J] . Yao Ma, Alex Olshevsky, Csaba Szepesvari, Journal of machine learning research . 2020,第a期

机译：用于稀疏秩的梯度下降 - 一个矩阵完成，用于稀疏互动工人的人群汇总
3. Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers [J] . Yao Ma, Alexander Olshevsky, Csaba Szepesvari, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：稀疏互动工作者的人群来源聚集的稀疏秩一矩阵完成的梯度下降
4. Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach [C] . Jinfeng Yi, Rong Jint, Anil K. Jain, AAAI Workshop on Human Computation . 2012

机译：用稀疏成对标签挤满：矩阵完成方法
5. Non-convex Methods for Spectrally Sparse Signal Reconstruction via Low-rank Hankel Matrix Completion [D] . Wang, Tianming. 2018

机译：通过低秩Hankel矩阵完成的光谱稀疏信号重建的非凸法方法
6. A transversal approach for patch-based label fusion via matrix completion [O] . Gerard Sanroma, Guorong Wu, Yaozong Gao, -1

机译：通过矩阵完成的基于补丁的标签融合的横向方法
7. A transversal approach for patch-based label fusion via matrix completion [O] . Gerard Sanroma, Guorong Wu, Yaozong Gao, 2015

机译：通过矩阵完成的基于补丁标签融合的横向方法

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

摘要

著录项

相似文献

相关主题

期刊订阅