首页> 外文会议>International Conference on Big Data and Smart Computing >An approach to spam comment detection through domain-independent features
【24h】

An approach to spam comment detection through domain-independent features

机译:通过与域无关的功能检测垃圾邮件评论的方法

获取原文

摘要

Previous research in spam detection, especially in email spam filtering, mainly focused on learning a set of discriminative features that are often present in the spam contents. Nowadays, these commercially oriented spams are well detected; the real challenge lies in filtering rather vague spams that do not exhibit distinctive spam keywords. We investigate two ways of detecting such spams: 1) By comparing the similarity between the publisher posts and user comments, and 2) by learning a single representative meta-feature such as user name or ID. The first measure relieves us from repetitively learning a set of domain-dependent spam features, and the second measure enables us to detect potential spam users even before the aggressive actions are performed. Prior to the language model comparison in the first method, we supplement the background information, normalize the text, perform co-reference resolution, and conduct word-to-word similarity measure in hope of enriching the language models to improve the classification accuracy. To evaluate the first measure, experiments on detecting blog-spam comments are conducted. As for the second measure, we employ SVM on the ID space of e-mail data collected by “Apache Spam Assassin”.
机译:以前在垃圾邮件检测(尤其是电子邮件垃圾邮件过滤)方面的研究主要集中在学习垃圾邮件内容中经常存在的一组区分功能。如今,这些以商业为目的的垃圾邮件已得到很好的检测。真正的挑战在于过滤不明显的垃圾邮件关键字的模糊的垃圾邮件。我们研究了两种检测此类垃圾邮件的方法:1)通过比较发布者帖子和用户评论之间的相似性,以及2)通过学习单个代表性元功能(例如用户名或ID)来进行。第一种措施使我们不必重复学习一组与域相关的垃圾邮件功能,而第二种措施使我们能够在执行激进措施之前就检测出潜在的垃圾邮件用户。在使用第一种方法进行语言模型比较之前,我们会补充背景信息,对文本进行规范化,执行共指解析以及进行词对词的相似性度量,以期丰富语言模型以提高分类准确性。为了评估第一个措施,进行了检测博客垃圾评论的实验。至于第二种措施,我们在“ Apache Spam Assassin”收集的电子邮件数据的ID空间上使用了SVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号