首页> 外文会议>International Conference on Complex Networks and Their Applications >A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media
【24h】

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

机译:在线社交媒体中仇恨语音检测的基于伯特的转移学习方法

获取原文

摘要

Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of annotated data to train a model. The lack of a sufficient amount of labelled hate speech data, along with the existing biases, has been the main issue in this domain of research. To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers). More specifically, we investigate the ability of BERT at capturing hateful context within social media content by using new fine-tuning methods based on transfer learning. To evaluate our proposed approach, we use two publicly available datasets that have been annotated for racism, sexism, hate, or offensive content on Twitter. The results show that our solution obtains considerable performance on these datasets in terms of precision and recall in comparison to existing approaches. Consequently, our model can capture some biases in data annotation and collection process and can potentially lead us to a more accurate model.
机译:社交媒体中的一部分用户的仇恨和有毒内容是一个崛起的现象,激励研究人员为讨论仇恨内容识别的具有挑战性的方向献上大量努力。我们不仅需要一个基于先进机器学习和自然语言处理的高效的自动讨厌语音检测模型,还需要足够大量的注释数据来训练模型。缺乏足够数量的标记仇恨语音数据以及现有的偏见,这是该研究领域的主要问题。为了解决这些需求,在本研究中,我们介绍了一种基于名为BERT的现有预先训练的语言模型的新推送学习方法(来自变压器的双向编码器表示)。更具体地说,我们调查BERT在通过使用基于迁移学习新的微调方法捕获社交媒体内容内可恨方面的能力。为了评估我们提出的方法,我们使用两种公开可用的数据集,这些数据集已为Twitter上的种族主义,性别,仇恨或冒犯内容注释。结果表明,与现有方法相比,我们的解决方案在精确和召回方面对这些数据集进行了相当大的性能。因此,我们的模型可以捕获数据注释和收集过程中的一些偏见,并且可能导致我们更准确的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号