How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?

Paula Fortuna; Juan Soler-Company; Leo Wanner

首页> 外文期刊>Information Processing & Management >How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?

【24h】

How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?

机译：仇恨言语，毒性，滥用和令人反感的语言分类模型如何概括到数据集？

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A considerable body of research deals with the automatic identification of hate speech and related phenomena. However, cross-dataset model generalization remains a challenge. In this context, we address two still open central questions: (ⅰ) to what extent does the generalization depend on the model and the composition and annotation of the training data in terms of different categories?, and (ⅱ) do specific features of the datasets or models influence the generalization potential? To answer (ⅰ), we experiment with BERT, ALBERT, fastText, and SVM models trained on nine common public English datasets, whose class (or category) labels are standardized (and thus made comparable), in intra- and cross-dataset setups. The experiments show that indeed the generalization varies from model to model and that some of the categories (e.g., 'toxic', 'abusive', or 'offensive') serve better as cross-dataset training categories than others (e.g., 'hate speech'). To answer (ⅱ), we use a Random Forest model for assessing the relevance of different model and dataset features during the prediction of the performance of 450 BERT, 450 ALBERT, 450 fastText, and 348 SVM binary abusive language classifiers (1698 in total). We find that in order to generalize well, a model already needs to perform well in an intra-dataset scenario. Furthermore, we find that some other parameters are equally decisive for the success of the generalization, including, e.g., the training and target categories and the percentage of the out-of-domain vocabulary.

机译：相当大的研究涉及仇恨言论的自动识别和相关现象。但是，跨数据集模型概括仍然是一个挑战。在这种情况下，我们解决了两个仍然开放的中央问题：（Ⅰ）泛化在多大程度上取决于模型和组成以及在不同类别方面的培训数据的构图和注释（Ⅱ）做出具体特征数据集或模型影响泛化潜力？答案（Ⅰ），我们在九个常见公共英语数据集上进行培训，艾伯特，FastText和SVM模型，其类（或类别）标签在内部和交叉数据集设置中标准化（并因此进行了可比）。实验表明，概述从模型变化到模型，其中一些类别（例如，“毒性”，“辱骂”或“攻击性”）优于交叉数据集培训类别（例如，'仇恨讲话'）。要回答（Ⅱ），我们使用随机森林模型来评估不同模型和数据集特征在预测450 BERT，450 Albert，450 FastText和348个SVM二进制滥用语言分类器的情况下的相关模型和数据集特征的相关性（总共1698年）。我们发现，为了概括，模型已经需要在一个数据集内场景中表现良好。此外，我们发现一些其他参数对于概括的成功同样决定性，包括例如培训和目标类别以及域外词汇表的百分比。

著录项

来源
《Information Processing & Management》 |2021年第3期|102524.1-102524.17|共17页
作者
Paula Fortuna; Juan Soler-Company; Leo Wanner;
展开▼
作者单位

Natural Language Processing Group Department of Communication and Information Technologies Pompeu Fabra University Spain;

Natural Language Processing Group Department of Communication and Information Technologies Pompeu Fabra University Spain;

Catalan Institute for Research and Advanced Studies (ICREA) Spain Natural Language Processing Group Department of Communication and Information Technologies Pompeu Fabra University Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Hate speech; Offensive language; Classification; Generalization;

机译：仇恨言论;令人反感的语言;分类;概括;

相似文献

外文文献
中文文献
专利

1. Hate Speech Classification in Indonesian Language Tweets by Using Convolutional Neural Network [J] . Dewa Ayu Nadia Taradhita, I Ketut Gede Darma Putra ITB Journal of Information and Communication Technology . 2021,第3期

机译：使用卷积神经网络讨厌印度尼西语语言推文的讲话分类
2. Comparing pre-trained language models for Spanish hate speech detection [J] . Miriam Plaza-del-Arco Flor, Dolores Molina-Gonzalez M., Alfonso Urena-Lopez L., Expert systems with applications . 2021,第Mara期

机译：比较预先培训的语言模型，用于西班牙语仇恨语音检测
3. Adding modeled speech-generating device use to a naturalistic language intervention facilitates generalized communicative spoken utterances immediately after treatment and generalized gains on declarative use 12 weeks after treatment ends in children with ASD who began treatment in the "word combination" stage [J] . Paul J. Yoder, Amy Harbison Tostanoski, Micheal P. Sbank Evidence based communication assessment and intervention . 2014,第1a4期

机译：对于在“单词组合”阶段开始治疗的ASD儿童，在治疗结束后12周后，将建模的语音生成设备的使用添加到自然主义语言干预中，可促进治疗后即刻的一般性交流语音和陈述性使用的普遍性收益
4. Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets [C] . Paula Fortuna, Juan Soler-Company, Leo Wanner International Conference on Language Resources and Evaluation . 2020

机译：有毒，可恶，令人反感或辱骂？我们真正分类了什么？仇恨语音数据集的实证分析
5. An exploratory modeling survey of the trait structures of some existing language test datasets. [D] . Davidson, Frederick Gavin. 1988

机译：对一些现有语言测试数据集的特征结构进行的探索性建模调查。
6. High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets [O] . Cristian G. Bologa, Vernon Shane Pankratz, Mark L. Unruh, 2021

机译：广义线性混合模型的分层可能性高性能实施：估计大规模电子健康记录数据集中钾参考范围的应用
7. L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language [O] . Hala Mulki, Hatem Haddad, Chedi Bechikh Ali, 2019

机译：L-HSAB：唯一Twitter DataSet，用于仇恨言语和滥用语言

How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅