首页> 外文会议>ACM international conference on information and knowledge management >Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus
【24h】

Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus

机译:通过大型Twitter语料库上的主题特征发现来检测攻击性推文

获取原文

摘要

In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using these automatically generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.
机译:在本文中,我们提出了一种新颖的半监督方法,用于检测Twitter中与亵渎相关的令人反感的内容。我们的方法通过在庞大的Twitter语料库上进行统计主题建模来利用亵渎语言的语言规律性,并使用这些自动生成的功能来检测令人反感的推文。我们的方法在各种机器学习(ML)算法中具有竞争力。例如,我们的方法在使用Logistic回归的4029条测试推文上实现了75.1%的真实阳性率(TP),大大优于热门关键字匹配基线(TP值为69.7%),同时将假阳性率(FP)保持在与基线相同,约为3.77%。我们的方法提供了完全监督学习方法所需的大规模手注释工作的替代方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号