首页> 外文会议>International Conference on Big Data and Smart Computing >An approach to spam comment detection through domain-independent features
【24h】

An approach to spam comment detection through domain-independent features

机译:通过域 - 独立功能垃圾邮件检测的方法

获取原文

摘要

Previous research in spam detection, especially in email spam filtering, mainly focused on learning a set of discriminative features that are often present in the spam contents. Nowadays, these commercially oriented spams are well detected; the real challenge lies in filtering rather vague spams that do not exhibit distinctive spam keywords. We investigate two ways of detecting such spams: 1) By comparing the similarity between the publisher posts and user comments, and 2) by learning a single representative meta-feature such as user name or ID. The first measure relieves us from repetitively learning a set of domain-dependent spam features, and the second measure enables us to detect potential spam users even before the aggressive actions are performed. Prior to the language model comparison in the first method, we supplement the background information, normalize the text, perform co-reference resolution, and conduct word-to-word similarity measure in hope of enriching the language models to improve the classification accuracy. To evaluate the first measure, experiments on detecting blog-spam comments are conducted. As for the second measure, we employ SVM on the ID space of e-mail data collected by “Apache Spam Assassin”.
机译:以前的垃圾邮件检测研究,特别是在电子邮件垃圾邮件过滤中,主要集中在学习一系列通常存在于垃圾邮件内容中的鉴别特征。如今,这些商业化的垃圾邮件被良好检测到;真正的挑战在于过滤相当模糊的垃圾邮件,这些垃圾邮件不表现出独特的垃圾邮件关键字。我们通过比较发布者帖子和用户评论之间的相似性和2)通过学习诸如用户名或ID等单个代表性元特征来检测此类垃圾邮件的两种方法:1)。第一次数减轻了我们从重复地学习一组域的垃圾特征,并且第二种措施使我们能够在执行攻击动作之前检测潜在的垃圾邮件用户。在第一种方法中的语言模型比较之前,我们补充了背景信息,正常化文本,执行共参考分辨率,并对富集语言模型来提高语言模型来进行语言相似度测量。为了评估第一措施,对检测博客评论进行检测的实验。至于第二种措施,我们在“Apache Spam Assassin”收集的电子邮件数据的ID空间上使用SVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号