Adversarial Filters of Dataset Biases

机译：数据集偏差的对抗筛选器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large neural models have demonstrated human-level performance on language and vision benchmarks, while their performance degrades considerably on adversarial or out-of-distribution samples. This raises the question of whether these models have learned to solve a dataset rather than the underlying task by overfitting to spurious dataset biases. We investigate one recently proposed approach, AFLite, which adversarially filters such dataset biases, as a means to mitigate the prevalent overestimation of machine performance. We provide a theoretical understanding for AFLite, by situating it in the generalized framework for optimum bias reduction. We present extensive supporting evidence that AFLite is broadly applicable for reduction of measurable dataset biases, and that models trained on the filtered datasets yield better generalization to out-of-distribution tasks. Finally, filtering results in a large drop in model performance (e.g., from 92% to 62% for SNLI), while human performance still remains high. Our work thus shows that such filtered datasets can pose new research challenges for robust generalization by serving as upgraded benchmarks.

机译：大型神经模型在语言和视觉基准上表现出人类水平的性能，而他们的性能在对抗性或分发外样品上显着降低。这提出了这些模型是否学会了解数据集而不是通过过度抵消杂散数据集偏见来解决数据集的问题。我们调查了最近提出的方法，这是对抗的前面滤除这种数据集偏差，作为减轻机器性能的普遍存存的手段。我们通过在广义框架中出于最佳偏差减少的广义框架来提供对AFLITE的理论理解。我们呈现了广泛的支持证据，即aplite广泛适用于减少可测量的数据集偏差，并且在过滤的数据集上培训的模型会产生更好的推广到分销外任务。最后，过滤导致模型性能的大幅下降（例如，SNLI的92％至62％），而人类性能仍然仍然很高。因此，我们的工作表明，这种过滤的数据集可以通过用作升级基准来构成强大的泛化的新研究挑战。

著录项

来源
《International Conference on Machine Learning》|2021年|787-1564p|共11页
会议地点
作者
Ronan Le Bras; Swabha Swayamdipta; Chandra Bhagavatula; Rowan Zellers; Matthew E. Peters; Ashish Sabharwal; Yejin Choi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. DReLAB - Deep REinforcement Learning Adversarial Botnet: A benchmark dataset for adversarial attacks against botnet Intrusion Detection Systems [J] . Andrea Venturi, Giovanni Apruzzese, Mauro Andreolini, Data in Brief . 2021,第3期

机译：DRELAB - 深度加强学习对抗僵尸网络：用于对僵尸网络入侵检测系统进行对抗性攻击的基准数据集
2. TGT: A Novel Adversarial Guided Oversampling Technique for Handling Imbalanced Datasets [J] . Ayat Mahmoud, Ayman El-Kilany, Farid Ali, Egyptian Informatics Journal . 2021,第4期

机译：TGT：一种用于处理不平衡数据集的新型逆势导向性过采样技术
3. Conditional Wasserstein Generative Adversarial Networks for Rebalancing Iris Image Datasets [J] . Yung-Hui LI, Muhammad Saqlain ASLAM, Latifa Nabila HARFIYA, IEICE transactions on information and systems . 2021,第9期

机译：用于重新平衡虹膜图像数据集的条件Wasserstein生成的对抗网络
4. Adversarial Filters of Dataset Biases [C] . Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, International Conference on Machine Learning . 2021

机译：数据集偏差的对抗筛选器
5. We Need to Talk About Robustness to Adversarial Attacks while Removing Spurious Dataset Biases [D] . Sachdeva, Bhavdeep Singh. 2021

机译：我们需要在删除虚假数据集偏见时讨论对抗性攻击的鲁棒性
6. DReLAB - Deep REinforcement Learning Adversarial Botnet: A benchmark dataset for adversarial attacks against botnet Intrusion Detection Systems [O] . Andrea Venturi, Giovanni Apruzzese, Mauro Andreolini, 2021

机译：DRELAB - 深度加强学习对抗僵尸网络：用于对僵尸网络入侵检测系统进行对抗性攻击的基准数据集
7. Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets [O] . Ohad Rozen, Vered Shwartz, Roee Aharoni, 2019

机译：多样化您的数据集：通过对冲数据集中的受控方差分析泛化

Adversarial Filters of Dataset Biases

摘要

著录项

相似文献

相关主题

期刊订阅