Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

机译：建立它，打破它，为对话安全修复它：对抗性人为攻击的鲁棒性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Galan-Garcfa et al., 2016), and the deployment of chatbots in the public domain (Wolf et al., 2017) are two examples that show the necessity of guarding against adversarially offensive behavior on the part of humans. In this work, we develop a training scheme for a model to become robust to such human attacks by an iterative build it, break it, fix it strategy with humans and models in the loop. In detailed experiments we show this approach is considerably more robust than previous systems. Further, we show that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work. Our newly collected tasks and methods are all made open source and publicly available.

机译：在对话中检测攻击性语言已成为自然语言处理中越来越重要的应用。在公共论坛中检测到巨魔（Galan-Garcfa等人，2016），以及在公共领域部署聊天机器人（Wolf等人，2017）是两个例子，表明有必要防范在网络上的对抗性攻击行为。人类的一部分。在这项工作中，我们为模型开发了一种训练方案，通过迭代构建，破坏模型，与人和模型在回路中固定策略，使其对此类人为攻击变得健壮。在详细的实验中，我们显示此方法比以前的系统健壮得多。此外，我们表明，在对话中使用的攻击性语言严重依赖于对话上下文，并且不能像大多数以前的工作一样被视为单句攻击性检测任务。我们新收集的任务和方法都是开源的，并且可以公开获得。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing 》|2019年|4536-4545|共10页
会议地点
作者
Emily Dinan; Samuel Humeau; Bharath Chintagunta; Jason Weston;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Robust Graph Neural Networks Against Adversarial Attacks via Jointly Adversarial Training [J] . Hu Tian, Bowei Ye, Xiaolong Zheng, IFAC PapersOnLine . 2020 ,第5期

机译：通过共同对抗培训，强大的图形神经网络对抗对抗攻击
2. Dialogue breakdown detection robust to variations in annotators and dialogue systems [J] . Takayama Junya, Nomoto Eriko, Arase Yuki Computer speech and language . 2019 ,第MARa期

机译：对话故障检测对注释器和对话系统的变化具有鲁棒性
3. No Surprises: Training Robust Lung Nodule Detection for Low-Dose CT Scans by Augmenting With Adversarial Attacks [J] . Liu Siqi, Setio Arnaud Arindra Adiyoso, Ghesu Florin C., IEEE Transactions on Medical Imaging . 2021 ,第1期

机译：没有惊喜：通过增强对抗性攻击来训练低剂量CT扫描的强大肺结节检测
4. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack [C] . Emily Dinan, Samuel Humeau, Bharath Chintagunta, International joint conference on natural language processing . 2019

机译：建立它破坏它修复了对话安全：来自对抗人体攻击的鲁棒性
5. We Need to Talk About Robustness to Adversarial Attacks while Removing Spurious Dataset Biases [D] . Sachdeva, Bhavdeep Singh. 2021

机译：我们需要在删除虚假数据集偏见时讨论对抗性攻击的鲁棒性
6. DReLAB - Deep REinforcement Learning Adversarial Botnet: A benchmark dataset for adversarial attacks against botnet Intrusion Detection Systems [O] . Andrea Venturi, Giovanni Apruzzese, Mauro Andreolini, 2021

机译：DRELAB - 深度加强学习对抗僵尸网络：用于对僵尸网络入侵检测系统进行对抗性攻击的基准数据集
7. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack [O] . Emily Dinan, Samuel Humeau, Bharath Chintagunta, 2019

机译：建立它破坏它修复了对话安全：来自对抗人体攻击的鲁棒性

Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

摘要

著录项

相似文献

相关主题

期刊订阅