首页> 外文期刊>Quality Control, Transactions >Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets
【24h】

Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

机译:评估机器学习技术,用于检测南非推文中的冒犯和仇恨言论

获取原文
获取原文并翻译 | 示例
           

摘要

In recent times, South Africa has been witnessing insurgence of offensive and hate speech along racial and ethnic dispositions on Twitter. Popular among the South African languages used is English. Although, machine learning has been successfully used to detect offensive and hate speech in several English contexts, the distinctiveness of South African tweets and the similarities among offensive, hate and free speeches require domain-specific English corpus and techniques to detect the offensive and hate speech. Thus, we developed an English corpus from South African tweets and evaluated different machine learning techniques to detect offensive and hate speech. Character n-gram, word n-gram, negative sentiment, syntactic-based features and their hybrid were extracted and analyzed using hyper-parameter optimization, ensemble and multi-tier meta-learning models of support vector machine, logistic regression, random forest, gradient boosting algorithms. The results showed that optimized support vector machine with character n-gram performed best in detection of hate speech with true positive rate of 0.894, while optimized gradient boosting with word n-gram performed best in detection of hate speech with true positive rate of 0.867. However, their performances in detection of other threatening classes were poor. Multi-tier meta-learning models achieved the most consistent and balanced classification performance with true positive rates of 0.858 and 0.887 for hate speech and offensive speech, respectively as well as true positive rate of 0.646 for free speech and overall accuracy of 0.671. The error analysis showed that multi-tier meta-learning model could reduce the misclassification error rate of the optimized models by 34.26 & x0025;.
机译:最近,南非一直在目睹了在Twitter上的种族和种族倾向的冒犯性和仇恨言论的叛乱。流行的南非语言是英语。虽然,机​​器学习已成功地用于在几种英语语境中检测冒犯和仇恨的言论,南非推文的独特性和攻击性,仇恨和自由言论的相似性需要域特定的英语语料库和技术来检测令人反感和仇恨的语音。因此,我们开发了南非推文的英语语料库,并评估了不同的机器学习技巧,以检测冒犯和仇恨的言论。用Hyper参数优化,集合和多层元学习模型,Logistic回归,随机森林,提取和分析字符N-GRAM,Word N-Gram,负面情绪,句法的特征及其杂种。梯度升压算法。结果表明,具有字符N-GRAM的优化支持向量机在检测的仇恨语音中表现为具有0.894的真正阳性率的仇恨语音,而用词n-gram的优化梯度在检测到仇恨语音的真正阳性速率为0.867的检测。然而,他们在检测其他威胁阶级的表现差。多层元学习模型实现了最符合和平衡的分类性能,具有0.858和0.887的真正阳性率,分别为仇恨言论和令人反感的演讲,以及0.646的真正阳性率,用于自由言论,总体准确性为0.671。误差分析表明,多层元学习模型可以将优化模型的错误分类错误率降低34.26&x0025;

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号