An approach to spam comment detection through domain-independent features

机译：通过域 - 独立功能垃圾邮件检测的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Previous research in spam detection, especially in email spam filtering, mainly focused on learning a set of discriminative features that are often present in the spam contents. Nowadays, these commercially oriented spams are well detected; the real challenge lies in filtering rather vague spams that do not exhibit distinctive spam keywords. We investigate two ways of detecting such spams: 1) By comparing the similarity between the publisher posts and user comments, and 2) by learning a single representative meta-feature such as user name or ID. The first measure relieves us from repetitively learning a set of domain-dependent spam features, and the second measure enables us to detect potential spam users even before the aggressive actions are performed. Prior to the language model comparison in the first method, we supplement the background information, normalize the text, perform co-reference resolution, and conduct word-to-word similarity measure in hope of enriching the language models to improve the classification accuracy. To evaluate the first measure, experiments on detecting blog-spam comments are conducted. As for the second measure, we employ SVM on the ID space of e-mail data collected by “Apache Spam Assassin”.

机译：以前的垃圾邮件检测研究，特别是在电子邮件垃圾邮件过滤中，主要集中在学习一系列通常存在于垃圾邮件内容中的鉴别特征。如今，这些商业化的垃圾邮件被良好检测到;真正的挑战在于过滤相当模糊的垃圾邮件，这些垃圾邮件不表现出独特的垃圾邮件关键字。我们通过比较发布者帖子和用户评论之间的相似性和2）通过学习诸如用户名或ID等单个代表性元特征来检测此类垃圾邮件的两种方法：1）。第一次数减轻了我们从重复地学习一组域的垃圾特征，并且第二种措施使我们能够在执行攻击动作之前检测潜在的垃圾邮件用户。在第一种方法中的语言模型比较之前，我们补充了背景信息，正常化文本，执行共参考分辨率，并对富集语言模型来提高语言模型来进行语言相似度测量。为了评估第一措施，对检测博客评论进行检测的实验。至于第二种措施，我们在“Apache Spam Assassin”收集的电子邮件数据的ID空间上使用SVM。

著录项

来源
《International Conference on Big Data and Smart Computing》|2016年||共4页
会议地点
作者
Jong Myoung Kim; Zae Myung Kim; Kwangjo Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
machine learning; spam filtering; spam user detection;

机译：机器学习;垃圾邮件过滤;垃圾邮件用户检测;

相似文献

外文文献
中文文献
专利

1. A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis [J] . A. Bhattarai, D. Dasgupta International journal of information security and privacy . 2011,第1期

机译：基于内容分析的自监督评论垃圾邮件检测方法
2. EMAIL SPAM DETECTION: A SYMBIOTIC FEATURE SELECTION APPROACH FOSTERED BY EVOLUTIONARY COMPUTATION [J] . PEDRO SOUSA, PAULO CORTEZ, RUI VAZ, International Journal of Information Technology & Decision Making . 2013,第4期

机译：电子邮件垃圾邮件检测：通过进化计算建立的符号特征选择方法
3. SPAM COMMENT DETECTION IN BLOG COMMENTS FROM BLOG RSS FEED BY MODIFIED TF-IDF ALGORITHM [J] . FOUZIA SULTANA, DR. STEPHEN CHARLES, DR. A. GOVARDHAN International Journal of Engineering Science and Technology . 2012,第3期

机译：修改后的TF-IDF算法从博客RSS Feed中检测博客评论中的垃圾评论
4. An approach to spam comment detection through domain-independent features [C] . Jong Myoung Kim, Zae Myung Kim, Kwangjo Kim International Conference on Big Data and Smart Computing . 2016

机译：通过与域无关的功能检测垃圾邮件评论的方法
5. A sublexical unit based hash model approach for spam detection. [D] . Zhang, Like. 2009

机译：基于次词法单元的哈希模型检测垃圾邮件。
6. Features versus Context: An approach for precise and detailed detection and delineation of faces and facial features [O] . Liya Ding, Aleix M. Martinez -1

机译：特点与上下文：一种精确和详细的检测和描绘面部和面部特征的方法
7. A Self-supervised Approach to Comment Spam Detection based on Content Analysis [O] . A. Bhattarai 2013

机译：基于内容分析的垃圾邮件检测自我监控方法

An approach to spam comment detection through domain-independent features

摘要

著录项

相似文献

相关主题

期刊订阅