首页> 外文会议>International Conference on Ubiquitous Information Management and Communication >No 'Love' Lost: Defending Hate Speech Detection Models Against Adversaries
【24h】

No 'Love' Lost: Defending Hate Speech Detection Models Against Adversaries

机译:没有“爱”丢失:防止对手的仇恨语音探测模型

获取原文

摘要

Although current state-of-the-art hate speech detection models achieve praiseworthy results, these models have shown themselves to be vulnerable to attacks. Easy-to-execute lexical evasion schemes such as removal of whitespace from a given text creates significant issues for word-based hate speech detection models. In this paper, we reproduce the results of five cutting-edge models as well as four significant evasion schemes from prior work. These schemes are required to maintain readability which enables us to recreate the original data. We present several new defenses that leverage this need for maintained meaning and readability, and these schemes perform on par with or exceed the results of adversarial retraining. Furthermore, we demonstrate that each lexical attack or evasion scheme can be overcome with our new defense mechanisms with some reducing the effectiveness of the scheme to a mere .1 to .01 drop in F-1 score. We also propose a new evasion scheme that outperforms those in previous work along with a corresponding defense. Using our results as a foundation, we contend that hate speech detection models can be defended against lexically morphed data without the need for significant retraining. Our work suggests that by utilizing the requirement for preserved meaning, one can create a suitable defense against evasion schemes with a high reversal rate.
机译:虽然目前最先进的仇恨语音检测模型实现值得称道的结果,但这些模型表明自己易受攻击。易于执行的词汇逃避方案,例如从给定文本删除空格的删除会为基于词的仇恨语音检测模型创造了重大问题。在本文中,我们再现了五种尖端模型的结果以及来自现有工作的四个重要的逃避方案。这些方案需要维持可读性,这使我们能够重新创建原始数据。我们提出了几种新的防御,从而利用这种需要维持的意义和可读性,这些计划与对抗或超过对抗性再次培训的结果进行。此外,我们证明,可以通过我们的新防御机制克服每个词汇攻击或逃避方案,从而将方案的有效性降低到MERE.1至0.1至0.1至.01中的F-1分数。我们还提出了一种新的逃避计划,以满足以前的工作以及相应的防守。使用我们的结果作为基础,我们认为仇恨语音检测模型可以针对词汇变形数据而无需显着再培训。我们的工作表明,通过利用保留意义的要求,可以为具有高逆转率的逃避方案创造适当的防御。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号