首页> 外文会议>International Conference on Ubiquitous Information Management and Communication >No 'Love' Lost: Defending Hate Speech Detection Models Against Adversaries
【24h】

No 'Love' Lost: Defending Hate Speech Detection Models Against Adversaries

机译:不会失去“爱”:捍卫针对敌人的仇恨语音检测模型

获取原文

摘要

Although current state-of-the-art hate speech detection models achieve praiseworthy results, these models have shown themselves to be vulnerable to attacks. Easy-to-execute lexical evasion schemes such as removal of whitespace from a given text creates significant issues for word-based hate speech detection models. In this paper, we reproduce the results of five cutting-edge models as well as four significant evasion schemes from prior work. These schemes are required to maintain readability which enables us to recreate the original data. We present several new defenses that leverage this need for maintained meaning and readability, and these schemes perform on par with or exceed the results of adversarial retraining. Furthermore, we demonstrate that each lexical attack or evasion scheme can be overcome with our new defense mechanisms with some reducing the effectiveness of the scheme to a mere .1 to .01 drop in F-1 score. We also propose a new evasion scheme that outperforms those in previous work along with a corresponding defense. Using our results as a foundation, we contend that hate speech detection models can be defended against lexically morphed data without the need for significant retraining. Our work suggests that by utilizing the requirement for preserved meaning, one can create a suitable defense against evasion schemes with a high reversal rate.
机译:尽管当前最新的仇恨语音检测模型取得了令人称赞的结果,但这些模型显示出自己容易受到攻击。诸如从给定文本中删除空格之类的易于执行的词汇规避方案对于基于单词的仇恨语音检测模型产生了重大问题。在本文中,我们重现了五个前沿模型的结果以及先前工作中的四个重要规避方案。需要使用这些方案来保持可读性,以使我们能够重新创建原始数据。我们提出了几种新的防御措施,它们利用了保持含义和可读性的这种需求,并且这些方案的表现与对抗性再训练的结果相当或超过。此外,我们证明,使用我们的新防御机制可以克服每种词汇攻击或规避方案,但有些方案会使该方案的有效性降低到F-1分数仅下降0.1至0.01。我们还提出了一种新的规避方案,与相应的防御措施相比,其性能优于先前的工作。以我们的结果为基础,我们认为仇恨语音检测模型可以针对词法变形数据进行防御,而无需进行大量的重新培训。我们的工作表明,通过利用保留含义的要求,可以针对高回避率的逃避方案创建适当的防御措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号