Detecting spam comments posted in micro-blogs using the self-extensible spam dictionary

机译：使用可自我扩展的垃圾邮件字典检测微博中发布的垃圾邮件评论

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The high popularity of Weibo has greatly enriched people's lives, allowing online users to share their feelings through posting comments. However, more and more spam comments are also being posted in users' blogs on this social media. In this paper, in order to effectively detect spam comments in Chinese micro-blogs, we introduce semantic analysis to construct a Self-Extensible Spam Dictionary which automatically expands itself when new words emerge on the micro-blogs frequently. The use of semantic analysis can provide us with additional features which are beneficial to detecting spam comments. A Proportion-Weight Filter (PWF) model is also proposed to detect two kinds of spam comments (AD and vulgar comments), by filtering the spam-weight and the spam-proportion of the Weibo comments based on our Self-Extensible Spam Dictionary criteria. Our experimental results demonstrate that when detecting a combination of both AD and vulgar spam comments, we can achieve an average detection accuracy of 87.9%. Particularly for AD spam comments detection, we can achieve an average accuracy of 96.2%, which is preferable compared to when using machine learning methods. The statistical analysis of the results verifies that our proposed methods can identify the spam comments effectively and to relatively high degrees of accuracy.

机译：微博的高度普及极大地丰富了人们的生活，允许在线用户通过发表评论来分享他们的感受。但是，越来越多的垃圾邮件评论也在该社交媒体上的用户博客中发布。为了有效地检测中文微博中的垃圾邮件评论，本文引入语义分析来构建自扩展垃圾邮件词典，当新词频繁出现在微博中时，该词典会自动进行自我扩展。语义分析的使用可以为我们提供其他功能，这些功能有利于检测垃圾邮件评论。还提出了一种比例权重过滤器（PWF）模型，通过基于我们的“可扩展垃圾邮件字典”标准过滤微博注释的垃圾邮件权重和垃圾邮件比例，来检测两种垃圾邮件注释（AD和粗俗注释）。我们的实验结果表明，同时检测到AD和粗俗垃圾邮件评论时，我们可以实现87.9％的平均检测准确率。特别是对于AD垃圾邮件评论检测，我们可以实现96.2％的平均准确度，与使用机器学习方法时相比，这是更好的选择。结果的统计分析证明，我们提出的方法可以有效地识别垃圾评论，并且具有相对较高的准确性。

著录项

来源
《IEEE International Conference on Communications》|2016年|1-7|共7页
会议地点
作者
Chenwei Liu; Jiawei Wang; Kai Lei;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Spam Spam Spam Spam. Lovely Spam? Wonderful Spam! [J] . Sheri R. Lanza Searcher . 2004,第1期

机译：垃圾邮件垃圾邮件垃圾邮件。可爱的垃圾邮件？精彩的垃圾邮件！
2. 'Spam, spam, spam, spam ... Lovely spam!' Why is Bluespam different? [J] . Eleni Kosta, Peggy Valcke, David Stevens International Review of Law Computers & Technology . 2009,第1a2期

机译：“垃圾邮件，垃圾邮件，垃圾邮件，垃圾邮件……可爱的垃圾邮件！”为什么Bluespam与众不同？
3. Spam, spam, spam, spam, spam … [J] . Neville Goodman The British journal of general practice: the journal of the Royal College of General Practitioners . 2004,第502期

机译：垃圾邮件，垃圾邮件，垃圾邮件，垃圾邮件，垃圾邮件…
4. Detecting Spam Comments Posted in Micro-Blogs Using the Self-Extensible Spam Dictionary [C] . Chenwei Liu, Jiawei Wang, Kai Lei Ad-Hoc and Sensor Networks Symposium . 2016

机译：检测使用自我伸展垃圾邮件发布在微博中发布的垃圾邮件评论
5. Evaluation of the CAN SPAM act: Testing deterrence and other influences of email spammer behavior over time. [D] . Kigerl, Alex C. 2014

机译：对CAN SPAM行为的评估：随时间测试电子邮件垃圾邮件发送者行为的威慑力和其他影响。
6. Spam spam spam spam spam … [O] . Neville Goodman 2004

机译：垃圾邮件垃圾邮件垃圾邮件垃圾邮件垃圾邮件…
7. Mining User Comment Activity for Detecting Forum Spammers in YouTube [O] . Sureka, Ashish 2011

机译：挖掘用户评论活动以检测YouTube中的论坛垃圾邮件发送者

Detecting spam comments posted in micro-blogs using the self-extensible spam dictionary

摘要

著录项

相似文献

相关主题

期刊订阅