A Parsimonious and Practical Approach to Detecting Offensive Speech

机译：一种促进令人攻击性言论的促销和实用方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the proliferation of hateful and offensive speech on social media platforms such as Twitter, machine learning approaches to detect such toxic content have gained prominence. Despite these advances, real-time detection of such speech, while it is being shared on these platforms, remains a challenge for two reasons. First, these approaches train complex models on a plethora of features, which calls into question their computational efficiency for real-time deployment. Moreover, they require sizeable, manually annotated data sets from the same context, and annotating large data sets is extremely time-consuming, error-prone and cumbersome. This paper proposes a parsimonious and practical approach for the detection of offensive speech that alleviates these challenges. The approach is parsimonious because through a comprehensive evaluation of commonly used machine learning models (Logistic Regression, Random Forest, Neural Networks) on two public domain data sets it demonstrates that a simple Logistic Regression model trained on unigrams with frequency counts can detect hate speech with high accuracy of over 90%. It is practical because it demonstrates how an existing labeled training data set can be used to train models that can detect offensive content from a completely unknown data set with moderate accuracy. Based on these findings, the paper offers guidance on the characteristics that may be desirable in benchmark training data sets for offensive speech detection.

机译：随着仇恨和令人反感的演讲的扩散，如Twitter等社交媒体平台，机器学习方法检测此类有毒内容的突出突出。尽管有这些预付款，但是在这些平台上共享的这种演讲的实时检测仍然是一个挑战。首先，这些方法在一流的特征上培训复杂模型，该功能调用它们的实时部署的计算效率。此外，它们需要相同的，手动注释的数据集，并且注释大数据集是非常耗时的，容易出错和繁琐的。本文提出了一种令人杀了和实用的方法，用于检测减轻这些挑战的令人攻击性致辞。该方法是解放的，因为在两个公共领域的数据集上综合评估常用的机器学习模型（Logistic回归，随机森林，神经网络），它表明，在频率计数上训练的简单逻辑回归模型可以检测仇恨语音高精度超过90％。它实用，因为它演示了现有的标签训练数据集如何用于培训可以从具有中等精度的完全未知的数据集中检测冒犯内容的模型。基于这些发现，本文提供了对基于基准训练数据集可用于令人反感的语音检测的特征的指导。

著录项

来源
《International Conference on Computing, Communication and Intelligent Systems》|2021年|688-695|共8页
会议地点
作者
Haseeb Khan; Frances Yu; Amar Sinha; Swapna S. Gokhale;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Voice activity detection; Social networking (online); Training data; Data models; Real-time systems; Random forests; Logistics;

机译：语音活动检测;社交网络（在线）;培训数据;数据模型;实时系统;随机森林;物流;

相似文献

外文文献
中文文献
专利

1. Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets [J] . Oriola Oluwafemi, Kotze Eduan Quality Control, Transactions . 2020,第期

机译：评估机器学习技术，用于检测南非推文中的冒犯和仇恨言论
2. Did You Really Beat the Market? A Practical and Parsimonious Approach to Evaluating Risk-Adjusted Performance [J] . David J. Moore Journal of Mathematical Finance . 2021,第3期

机译：你真的击败了市场吗？评估风险调整性能的实用又解释的方法
3. A Practical pedestrian approach to parsimonious regression with inaccurate inputs [J] . Seppo Karrila Sonklanakarin Journal of Science and Technology . 2014,第2期

机译：输入不准确的简约行人回归方法
4. UNBNLP at SemEval-2019 Task 5 and 6: Using Language Models to Detect Hate Speech and Offensive Language [C] . Ali Hakimi Parizi, Milton King, Paul Cook Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies;International workshop on semantic evaluation . 2019

机译：UNBNLP在SemEval-2019任务5和6：使用语言模型检测仇恨言语和令人反感的语言
5. Detecting Offensive Social Media Text in Nepali Language [D] . ?Timilsina, Sandesh 2020

机译：进攻检测社会化媒体中的文本尼泊尔语
6. A novel quantitation approach for maximizing detectable targets for offensive/volatile odorants with diverse functional groups by thermal desorption-gas chromatography-mass spectrometry [O] . Yong-Hyun Kim, Ki-Hyun Kim -1

机译：通过热脱附-气相色谱-质谱法最大化具有不同官能团的令人讨厌/挥发性增香剂的可检测目标的新型定量方法
7. Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets [O] . Oluwafemi Oriola, Eduan Kotze 2020

机译：评估机器学习技术，用于检测南非推文中的冒犯和仇恨言论
8. Good Bugs, Bad Bugs: A Modern Approach for Detecting Offensive Biological Weapons Research [R] . Moodie, M., Loeb, C., Armstrong, R., 2008

机译：好虫子，坏虫：一种检测进攻性生物武器研究的现代方法

A Parsimonious and Practical Approach to Detecting Offensive Speech

摘要

著录项

相似文献

相关主题

期刊订阅