【24h】

Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack

机译:建立它,打破它,为对话安全修复它:对抗性人为攻击的鲁棒性

获取原文

摘要

The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Galan-Garcfa et al., 2016), and the deployment of chatbots in the public domain (Wolf et al., 2017) are two examples that show the necessity of guarding against adversarially offensive behavior on the part of humans. In this work, we develop a training scheme for a model to become robust to such human attacks by an iterative build it, break it, fix it strategy with humans and models in the loop. In detailed experiments we show this approach is considerably more robust than previous systems. Further, we show that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work. Our newly collected tasks and methods are all made open source and publicly available.
机译:在对话中检测攻击性语言已成为自然语言处理中越来越重要的应用。在公共论坛中检测到巨魔(Galan-Garcfa等人,2016),以及在公共领域部署聊天机器人(Wolf等人,2017)是两个例子,表明有必要防范在网络上的对抗性攻击行为。人类的一部分。在这项工作中,我们为模型开发了一种训练方案,通过迭代构建,破坏模型,与人和模型在回路中固定策略,使其对此类人为攻击变得健壮。在详细的实验中,我们显示此方法比以前的系统健壮得多。此外,我们表明,在对话中使用的攻击性语言严重依赖于对话上下文,并且不能像大多数以前的工作一样被视为单句攻击性检测任务。我们新收集的任务和方法都是开源的,并且可以公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号