No 'Love' Lost: Defending Hate Speech Detection Models Against Adversaries

机译：没有“爱”丢失：防止对手的仇恨语音探测模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although current state-of-the-art hate speech detection models achieve praiseworthy results, these models have shown themselves to be vulnerable to attacks. Easy-to-execute lexical evasion schemes such as removal of whitespace from a given text creates significant issues for word-based hate speech detection models. In this paper, we reproduce the results of five cutting-edge models as well as four significant evasion schemes from prior work. These schemes are required to maintain readability which enables us to recreate the original data. We present several new defenses that leverage this need for maintained meaning and readability, and these schemes perform on par with or exceed the results of adversarial retraining. Furthermore, we demonstrate that each lexical attack or evasion scheme can be overcome with our new defense mechanisms with some reducing the effectiveness of the scheme to a mere .1 to .01 drop in F-1 score. We also propose a new evasion scheme that outperforms those in previous work along with a corresponding defense. Using our results as a foundation, we contend that hate speech detection models can be defended against lexically morphed data without the need for significant retraining. Our work suggests that by utilizing the requirement for preserved meaning, one can create a suitable defense against evasion schemes with a high reversal rate.

机译：虽然目前最先进的仇恨语音检测模型实现值得称道的结果，但这些模型表明自己易受攻击。易于执行的词汇逃避方案，例如从给定文本删除空格的删除会为基于词的仇恨语音检测模型创造了重大问题。在本文中，我们再现了五种尖端模型的结果以及来自现有工作的四个重要的逃避方案。这些方案需要维持可读性，这使我们能够重新创建原始数据。我们提出了几种新的防御，从而利用这种需要维持的意义和可读性，这些计划与对抗或超过对抗性再次培训的结果进行。此外，我们证明，可以通过我们的新防御机制克服每个词汇攻击或逃避方案，从而将方案的有效性降低到MERE.1至0.1至0.1至.01中的F-1分数。我们还提出了一种新的逃避计划，以满足以前的工作以及相应的防守。使用我们的结果作为基础，我们认为仇恨语音检测模型可以针对词汇变形数据而无需显着再培训。我们的工作表明，通过利用保留意义的要求，可以为具有高逆转率的逃避方案创造适当的防御。

著录项

来源
《International Conference on Ubiquitous Information Management and Communication》|2020年|1 v.|共6页
会议地点
作者
Melody Moh; Teng-Sheng Moh; Brian Khieu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类总体结构、系统结构;
关键词
Voice activity detection; Machine learning; Feature extraction; Data models; Computational modeling; Computer science; Social network services;

机译：语音活动检测;机器学习;特征提取;数据模型;计算建模;计算机科学;社交网络服务;

相似文献

外文文献
中文文献
专利

1. Comparing pre-trained language models for Spanish hate speech detection [J] . Miriam Plaza-del-Arco Flor, Dolores Molina-Gonzalez M., Alfonso Urena-Lopez L., Expert systems with applications . 2021,第Mara期

机译：比较预先培训的语言模型，用于西班牙语仇恨语音检测
2. Towards Hate Speech Detection at Large via Deep Generative Modeling [J] . Wullach Tomer, Adler Amir, Minkov Einat IEEE internet computing . 2021,第2期

机译：通过深度生成型建模大量讨论仇恨语音检测
3. Hate speech detection and racial bias mitigation in social media based on BERT model [J] . Marzieh Mozafari, Reza Farahbakhsh, No?l Crespi PLoS One . 2020,第8期

机译：基于BERT模型的社交媒体讨厌讲话检测和种族偏见缓解
4. No 'Love' Lost: Defending Hate Speech Detection Models Against Adversaries [C] . Melody Moh, Teng-Sheng Moh, Brian Khieu International Conference on Ubiquitous Information Management and Communication . 2020

机译：不会失去“爱”：捍卫针对敌人的仇恨语音检测模型
5. On the Detection of Hate Speech, Hate Speakers and Polarized Groups in Online Social Media [D] . Warmsley, Dana. 2017

机译：在线社交媒体中仇恨言论，仇恨演说者和两极分化群体的检测
6. Extended Defense Systems: I. Adversary-Defender Modeling Grammar for Vulnerability Analysis and Threat Assessment [R] . Merkle, P. B. 2006

机译：扩展防御系统：I。用于漏洞分析和威胁评估的对手 - 后卫建模语法

No 'Love' Lost: Defending Hate Speech Detection Models Against Adversaries

摘要

著录项

相似文献

相关主题

期刊订阅