首页> 外文会议>Workshop on Computational Approaches to Linguistic Code-Switching >Abusive content detection in transliterated Bengali-English social media corpus
【24h】

Abusive content detection in transliterated Bengali-English social media corpus

机译:音译孟加拉英语社交媒体语料库中的滥用内容检测

获取原文

摘要

Abusive text detection in low-resource languages such as Bengali is a challenging task due to the inadequacy of resources and tools. The ubiquity of transliterated Bengali comments in social media makes the task even more involved as monolingual approaches cannot capture them. Unfortunately, no transliterated Bengali corpus is publicly available yet for abusive content analysis. Therefore, in this paper, we introduce an annotated corpus of 3000 transliterated Bengali comments categorized into two classes, abusive and non-abusive, 1500 comments for each. For baseline evaluations, we employ several supervised machine learning (ML) and deep learning-based classifiers. We find support vector machine (SVM) classifier shows the highest efficacy for identifying abusive content. We make the annotated corpus publicly available for the researchers to aid abusive content detection in Bengali social media data.
机译:由于资源和工具的不足,孟加拉等低资源语言的滥用文本检测是一个具有挑战性的任务。 音译孟加拉人在社交媒体中的评论中的笨蛋使得这项任务更加涉及单声道方法无法捕获它们。 不幸的是,没有音译孟加拉语法尚未公开可用于滥用内容分析。 因此,在本文中,我们介绍了3000个音译孟加拉语评论的注释语料库,分为两个课程,辱骂和非滥用,1500条评论。 对于基线评估,我们采用了几种监督机器学习(ML)和基于深度学习的分类器。 我们发现支持向量机(SVM)分类器显示识别滥用内容的最高效果。 我们将注释的语料库公开可用于研究人员,以帮助孟加拉社交媒体数据中的滥用内容检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号