首页> 外文会议>Workshop on Online Abuse and Harms >Reducing Unintended Identity Bias in Russian Hate Speech Detection
【24h】

Reducing Unintended Identity Bias in Russian Hate Speech Detection

机译:在俄罗斯仇恨语音检测中减少意外的身份偏见

获取原文

摘要

Toxicity has become a grave problem for many online communities and has been growing across many languages, including Russian. Hate speech creates an environment of intimidation, discrimination, and may even incite some real-world violence. Both researchers and social platforms have been focused on developing models to detect toxicity in online communication for a while now. A common problem of these models is the presence of bias towards some words (e.g. woman, black, jew or женщина, черный, еврей) that are not toxic, but serve as triggers for the classifier due to model caveats. In this paper, we describe our efforts towards classifying hate speech in Russian, and propose simple techniques of reducing unintended bias, such as generating training data with language models using terms and words related to protected identities as context and applying word dropout to such words.
机译:毒性已成为许多在线社区的严重问题,并且在许多语言中都在增长,包括俄罗斯。仇恨言论创造了恐吓,歧视,甚至可能煽动一些现实世界的暴力。研究人员和社交平台都集中在开发模型,以检测在线通信中的毒性。这些模型的一个常见问题是对某些词语的存在(例如,没有毒性的女人,黑色,犹太人,еншина,черный,Еврей),但由于模型警告而作为分类器的触发器。在本文中,我们描述了我们在俄语中追究仇恨言论的努力,并提出了简单的技术来减少意外偏见的简单技术,例如使用与受保护的身份相关的语言模型生成培训数据,以及与上下文相关的语言模型,并将Word丢失应用于此类单词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号