Probing Toxic Content in Large Pre-Trained Language Models

机译：探测大型预先训练的语言模型中的有毒内容

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large pre-trained language models (PTLMs) have been shown to carry biases towards different social groups which leads to the reproduction of stereotypical and toxic content by major NLP systems. We propose a method based on logistic regression classifiers to probe English, French, and Arabic PTLMs and quantify the potentially harmful content that they convey with respect to a set of templates. The templates are prompted by a name of a social group followed by a cause-effect relation. We use PTLMs to predict masked tokens at the end of a sentence in order to examine how likely they enable toxicity towards specific communities. We shed the light on how such negative content can be triggered within unrelated and benign contexts based on evidence from a large-scale study, then we explain how to take advantage of our methodology to assess and mitigate the toxicity transmitted by PTLMs.

机译：已经显示出大型预先接受的语言模型（PTLMS）对不同的社会群体进行偏见，这导致主要NLP系统的陈规定型和毒性内容的再现。我们提出了一种基于Logistic回归分类器的方法，以探测英语，法语和阿拉伯语PTLMS，并量化它们在一组模板中传达的潜在有害内容。通过社交组的名称提示模板，然后是原因效果关系。我们使用PTLMS预测句子末尾的蒙版令牌，以便检查它们对特定社区的毒性有多大程度。我们阐明了这种负面内容如何在不相关的和良性上下文中基于来自大规模研究的证据，然后解释如何利用我们的方法来评估和减轻PTLMS传播的毒性。

著录项

来源
《Annual Meeting of the Association for Computational Linguistics;International Joint Conference on Natural Language Processing》|2021年|4262-4274|共13页
会议地点
作者
Nedjma Ousidhoum; Xinran Zhao; Tianqing Fang; Yangqiu Song; Dit-Yan Yeung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:58:34

相似文献

外文文献
中文文献
专利

1. Comparing pre-trained language models for Spanish hate speech detection [J] . Miriam Plaza-del-Arco Flor, Dolores Molina-Gonzalez M., Alfonso Urena-Lopez L., Expert systems with applications . 2021,第Mara期

机译：比较预先培训的语言模型，用于西班牙语仇恨语音检测
2. Injecting Event Knowledge into Pre-Trained Language Models for Event Extraction [J] . Zining Yang, Siyu Zhan, Mengshu Hou, Computer Science & Information Technology . 2020,第14期

机译：将事件知识注入预先培训的语言模型以进行事件提取
3. Event Nugget Detection using Pre-trained Language Models [J] . Riadh Meghatria, Chiraz Latiri, Fahima Nader Procedia Computer Science . 2020,第5期

机译：事件块使用预先培训的语言模型检测
4. Modeling Content Importance for Summarization with Pre-trained Language Models [C] . Liqiang Xiao, Lu Wang, Hao He, Conference on Empirical Methods in Natural Language Processing . 2020

机译：用预先接受训练的语言模型来建模内容重要性
5. A Predictive and Interpretable Model for Toxic Content Classification [D] . Xiang, Tong. 2021

机译：有毒内容分类的预测和可解释模型
6. Relation Extraction from Clinical Narratives Using Pre-trained Language Models [O] . Qiang Wei, Zongcheng Ji, Yuqi Si, 2019

机译：使用预训练的语言模型从临床叙事中提取关系
7. A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification [O] . Zhixue Zhao, Ziqi Zhang, Frank Hopfgartner 2021

机译：使用预培训的语言模型进行毒性评论分类的比较研究

Probing Toxic Content in Large Pre-Trained Language Models

摘要

著录项

相似文献

相关主题

期刊订阅