首页> 外文会议>ACM international conference on information and knowledge management >Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus

【24h】

Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus

机译：通过大型Twitter语料库上的主题特征发现来检测攻击性推文

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using these automatically generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.

机译：在本文中，我们提出了一种新颖的半监督方法，用于检测Twitter中与亵渎相关的令人反感的内容。我们的方法通过在庞大的Twitter语料库上进行统计主题建模来利用亵渎语言的语言规律性，并使用这些自动生成的功能来检测令人反感的推文。我们的方法在各种机器学习（ML）算法中具有竞争力。例如，我们的方法在使用Logistic回归的4029条测试推文上实现了75.1％的真实阳性率（TP），大大优于热门关键字匹配基线（TP值为69.7％），同时将假阳性率（FP）保持在与基线相同，约为3.77％。我们的方法提供了完全监督学习方法所需的大规模手注释工作的替代方法。

著录项

来源
《ACM international conference on information and knowledge management 》|2012年|1980-1984|共5页
会议地点
作者
Guang Xiang; Bin Fan; Ling Wang; Jason I. Hong; Carolyn P. Rose;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Twitter; Hadoop; topic modeling; machine learning;

机译：推特; Hadoop;主题建模;机器学习;

相似文献

外文文献
中文文献
专利

1. Towards the Discovery of Influencers to Follow in Micro-Blogs (Twitter) by Detecting Topics in Posted Messages (Tweets) [J] . Mubashir Ali, Anees Baqir, Giuseppe Psaila, Applied Sciences . 2020 ,第16期

机译：通过检测发布消息中的主题（推文），在微博（Twitter）中发现有影响力（Twitter）
2. Personality Adjectives in Twitter Tweets and in the Google Books Corpus. An Analysis of the Facet Structure of the Openness Factor of Personality [J] . Roivainen Eka Current Psychology . 2015 ,第4期

机译：Twitter推文和Google Books Corpus中的个性形容词。人格开放因素的方面结构分析
3. Adding Twitter-Specific Features to Stylistic Features for Classifying Tweets by User Type and Number of Retweets [J] . Yui Arakawa, Akihiro Kameda, Akiko Aizawa, Journal of the American Society for Information Science and Technology . 2014 ,第7期

机译：将Twitter特定功能添加到样式功能中，以按用户类型和转发数量对推文进行分类
4. Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus [C] . Guang Xiang, Bin Fan, Ling Wang, ACM international conference on information and knowledge management . 2012

机译：通过在大型Twitter语料库上通过局部特征发现来检测令人反感的推文
5. Kaizen Programming with Enhanced Feature Discovery: An Automated Approach to Feature Selection and Feature Discovery for Prediction Models [D] . Stelmack, John. 2020

机译：Kaizen编程，具有增强功能发现：用于预测模型的特征选择和特征发现的自动方法
6. Where in the world is my tweet: Detecting irregular removal patterns on Twitter [O] . Joan C. Timoneda 2012

机译：我的推文在哪里：在Twitter上检测不规则的删除方式
7. Towards the Discovery of Influencers to Follow in Micro-Blogs (Twitter) by Detecting Topics in Posted Messages (Tweets) [O] . Mubashir Ali, Anees Baqir, Giuseppe Psaila, 2020

机译：通过检测发布消息中的主题（推文），在微博（Twitter）中发现有影响力（Twitter）
8. Detecting Malicious Tweets in Twitter Using Runtime Monitoring With Hidden Information. [R] . Yilmaz, A. 2016

机译：使用隐藏信息的运行时监控在Twitter中检测恶意推文。

Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus

摘要

著录项

相似文献

相关主题

期刊订阅