首页> 外文期刊>Information Processing & Management >Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach
【24h】

Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach

机译:令人反感,侵略性和讨厌的言语分析:从以数据为中心到以人为本的方法

获取原文
获取原文并翻译 | 示例
       

摘要

Analysis of subjective texts like offensive content or hate speech is a great challenge, especially regarding annotation process. Most of current annotation procedures are aimed at achieving a high level of agreement in order to generate a high quality reference source. However, the annotation guidelines for subjective content may restrict the annotators' freedom of decision making. Motivated by a moderate annotation agreement in offensive content datasets, we hypothesize that personalized approaches to offensive content identification should be in place. Thus, we propose two novel perspectives of perception: group-based and individual. Using demographics of annotators as well as embeddings of their previous decisions (annotated texts), we are able to train multimodal models (including transformer-based) adjusted to personal or community profiles. Based on the agreement of individuals and groups, we experimentally showed that annotator group agreeability strongly correlates with offensive content recognition quality. The proposed personalized approaches enabled us to create models adaptable to personal user beliefs rather than to agreed offensiveness understanding. Overall, our individualized approaches to offensive content classification outperform classic data-centric methods that generalize offensiveness perception and it refers to all six tested models. Additionally, we developed requirements for annotation procedures, personalization and content processing to make the solutions human-centered.
机译:对令人攻击内容或仇恨等主观文本的分析是一个巨大的挑战,特别是关于注释过程。最新的注释程序旨在实现高度协议,以产生高质量的参考源。但是,主观内容的注释指南可能会限制注释者的决策自由。在冒犯内容数据集中的适度注释协议的动机,我们假设应当到位个性化对攻击内容识别的方法。因此,我们提出了两种新颖的感知视角:基于团体和个人。使用注册器的人口统计数据以及他们以前的决定(注释文本)的嵌入式,我们能够培训调整到个人或社区配置文件的多模式模型(包括基于变压器的)。根据个人和团体的协议,我们通过实验表明,注释群协商能力与令人反感的内容识别质量密切相关。拟议的个性化方法使我们能够创建适应个人用户信仰的模型,而不是同意冒险理解。总体而言,我们的个性化内容分类方法优于呈现普遍性的经典数据中心方法,概括了冒险性感知,并且它是指所有六种测试模型。此外,我们开发了对辅助程序,个性化和内容处理的要求,以使解决方案以人为本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号