Abusive content detection in transliterated Bengali-English social media corpus

机译：音译孟加拉英语社交媒体语料库中的滥用内容检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abusive text detection in low-resource languages such as Bengali is a challenging task due to the inadequacy of resources and tools. The ubiquity of transliterated Bengali comments in social media makes the task even more involved as monolingual approaches cannot capture them. Unfortunately, no transliterated Bengali corpus is publicly available yet for abusive content analysis. Therefore, in this paper, we introduce an annotated corpus of 3000 transliterated Bengali comments categorized into two classes, abusive and non-abusive, 1500 comments for each. For baseline evaluations, we employ several supervised machine learning (ML) and deep learning-based classifiers. We find support vector machine (SVM) classifier shows the highest efficacy for identifying abusive content. We make the annotated corpus publicly available for the researchers to aid abusive content detection in Bengali social media data.

机译：由于资源和工具的不足，孟加拉等低资源语言的滥用文本检测是一个具有挑战性的任务。音译孟加拉人在社交媒体中的评论中的笨蛋使得这项任务更加涉及单声道方法无法捕获它们。不幸的是，没有音译孟加拉语法尚未公开可用于滥用内容分析。因此，在本文中，我们介绍了3000个音译孟加拉语评论的注释语料库，分为两个课程，辱骂和非滥用，1500条评论。对于基线评估，我们采用了几种监督机器学习（ML）和基于深度学习的分类器。我们发现支持向量机（SVM）分类器显示识别滥用内容的最高效果。我们将注释的语料库公开可用于研究人员，以帮助孟加拉社交媒体数据中的滥用内容检测。

著录项

来源
《Workshop on Computational Approaches to Linguistic Code-Switching》|2021年|125-130|共6页
会议地点
作者
Salim Sazzed;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Automatic Detection of Cyberbullying and Abusive Language in Arabic Content on Social Networks: A Survey [J] . Marwa Khairy, Tarek M. Mahmoud, Tarek Abd-El-Hafeez Procedia Computer Science . 2021,第a期

机译：在社交网络中的阿拉伯语内容中自动检测网络欺凌和滥用语言：调查
2. A Dataset and Preliminaries Study for Abusive Language Detection in Indonesian Social Media [J] . Muhammad Okky Ibrohim, Indra Budi Procedia Computer Science . 2018,第1期

机译：印尼社交媒体中滥用语言检测的数据集和初步研究
3. A Large-Scale Social Media Corpus for the Detection of Youth Depression (Project Note) [J] . Wajdi Zaghouani Procedia Computer Science . 2018,第1期

机译：大型社交媒体语料库，用于检测青年抑郁症（项目说明）
4. A Comparison of Classical Versus Deep Learning Techniques for Abusive Content Detection on Social Media Sites [C] . Hao Chen, Susan McKeever, Sarah Jane Delany International Conference on Social Informatics . 2018

机译：社交媒体网站上滥用内容检测的经典与深度学习技术的比较
5. Towards Machine Learning for Gulf Dialectical Arabic Malicious Content Detection in Social Media [D] . Alorini, Dema. 2018

机译：面向机器学习的社交媒体中海湾辩证阿拉伯语恶意内容检测
6. Combined Effect of Abusive Supervision and Abusive Supervision Climate on Employee Creativity: A Moderated Mediation Model [O] . Chuangang Shen, Jing Yang, Sanman Hu 2020

机译：滥用监督监督监督气候对员工创造力的综合影响：一种审核调解模型
7. A Comparison of Classical Versus Deep Learning Techniques for Abusive Content Detection on Social Media Sites [O] . Hao Chen, Susan McKeever, Sarah Jane Delany 2018

机译：社交媒体网站滥用内容检测的古典与深层学习技术的比较

Abusive content detection in transliterated Bengali-English social media corpus

摘要

著录项

相似文献

相关主题

期刊订阅