CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data

机译：Crowdteacher：具有嘈杂的答案和表格数据的特定样本扰动的强大共同教学

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Samples with ground truth labels may not always be available in numerous domains. While learning from crowdsourcing labels has been explored, existing models can still fail in the presence of sparse, unreliable, or differing annotations. Co-teaching methods have shown promising improvements for computer vision problems with noisy labels by employing two classifiers trained on each others' confident samples in each batch. Inspired by the idea of separating confident and uncertain samples during the training process, we extend it for the crowdsourcing problem. Our model, CrowdTeacher, uses the idea that perturbation in the input space model can improve the robustness of the classifier for noisy labels. Treating crowdsourcing annotations as a source of noisy labeling, we perturb samples based on the certainty from the aggregated annotations. The perturbed samples are fed to a Co-teaching algorithm tuned to also accommodate smaller tabular data. We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets across various label density settings. Our experiments reveal that our proposed approach beats baselines modeling individual annotations and then combining them, methods simultaneously learning a classifier and inferring truth labels, and the Co-teaching algorithm with aggregated labels through common truth inference methods.

机译：具有地面真理标签的样本可能并不总是在许多域中可用。在探索了众群标签中学习的同时，现有模型仍然可能在稀疏，不可靠或不同的注释存在下失败。共同教学方法通过使用在每批中的每个分类机上采用训练的两个分类器，有希望改善嘈杂的标签。灵感来自在培训过程中分离自信和不确定样本的想法，我们将其扩展为众群问题。我们的模型，众人使用的想法，即扰动输入空间模型中的扰动可以提高嘈杂标签的分类器的稳健性。将众包注释视为嘈杂标签的源泉，我们根据聚合注释的确定性涉及样品。扰动的样本被馈送到调谐的共同教学算法，以适应更小的表格数据。我们在各种标签密度设置上展示使用Crowdteacher进行的预测电力达到预测电力的提升。我们的实验表明，我们所提出的方法击败基线建模个别注释，然后将它们组合，同时学习分类器和推断真理标签，以及通过常规真理推断方法与聚合标签的共同教学算法。

著录项

来源
《Pacific-Asia Conference on Knowledge Discovery and Data Mining》|2021年|181-193|共13页
会议地点
作者
Mani Sotoodeh; Li Xiong; Joyce Ho;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Crowdsourcing; Noisy labels; Input space perturbation;

机译：众包;嘈杂的标签;输入空间扰动;

相似文献

外文文献
中文文献
专利

1. Minimum-distance controlled perturbation methods for large-scale tabular data protection [J] . Castro J European Journal of Operational Research . 2006,第1期

机译：用于大规模表格数据保护的最小距离控制摄动方法
2. Controlled rounding and cell perturbation: statistical disclosure limitation methods for tabular data [J] . Salazar-Gonzalez JJ Mathematical Programming . 2006,第2a3期

机译：受控舍入和单元扰动：表格数据的统计披露限制方法
3. Controlled rounding and cell perturbation: statistical disclosure limitation methods for tabular data [J] . Juan-José Salazar-González Mathematical Programming . 2006,第2a3期

机译：受控舍入和单元扰动：表格数据的统计披露限制方法
4. Improving Machine Learning Modeling of Nonlinear Processes Under Noisy Data Via Co-teaching Method [C] . Zhe Wu, David Rincon, Junwei Luo, Annual American Control Conference . 2021

机译：通过共同教学方法改善噪声数据下非线性过程的机器学习建模
5. On Graph Perturbation Theory and Algorithms for Scalable Mining of Noisy and Uncertain Graph Data with Knowledge Priors. [D] . Hendrix, William Thomas. 2010

机译：图扰动理论和算法用于有知识先验的噪声和不确定图数据的可伸缩挖掘。
6. A robust penalized method for the analysis of noisy DNA copy number data [O] . Xiaoli Gao, Jian Huang 2010

机译：分析嘈杂的DNA拷贝数数据的可靠的惩罚方法
7. HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data [O] . Wenhu Chen, Hanwen Zha, Zhiyu Chen, 2020

机译：HybridQa：通过表格和文本数据回答的多跳问题的数据集
8. Noisy-Channel Approach to Question Answering [R] . Echihabi, A. , Marcu, D. 2003

机译：嘈杂通道解决问题的方法

CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data

摘要

著录项

相似文献

相关主题

期刊订阅