首页> 外文期刊>International Journal of Innovative Computing Information and Control >POLARITYSPAM: PROPAGATING CONTENT-BASED INFORMATION THROUGH A WEB-GRAPH TO DETECT WEB-SPAM
【24h】

POLARITYSPAM: PROPAGATING CONTENT-BASED INFORMATION THROUGH A WEB-GRAPH TO DETECT WEB-SPAM

机译:POLARITYSPAM:通过Web图像传播基于内容的信息以检测Web垃圾邮件

获取原文
获取原文并翻译 | 示例

摘要

Spam web pages have become a problem for Information Retrieval systems due to the negative effects that this phenomenon can cause in their results. In this work we tackle the problem of detecting these pages with a propagation algorithm that, taking as input a web graph, chooses a set of spam and not-spam web pages in order to spread their spam likelihood over the rest of the network. Thus we take advantage of the links between pages to obtain a ranking of pages according to their relevance and their spam likelihood. Our intuition consists in giving a high reputation to those pages related to relevant ones, and giving a high spam likelihood to the pages linked to spam web pages. We introduce the novelty of including the content of the web pages in the computation of an a priori estimation of the spam likelihood of the pages, and propagate this information. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) is a web page, according to its textual content and its relations in the graph. The experimental results show that our method outperforms other techniques for spam detection.
机译:垃圾邮件网页已成为信息检索系统的问题,因为这种现象可能会对结果造成负面影响。在这项工作中,我们解决了使用传播算法检测这些页面的问题,该传播算法以Web图形作为输入,选择一组垃圾邮件和非垃圾邮件网页,以将其垃圾邮件可能性散布到网络的其余部分。因此,我们利用页面之间的链接来根据页面的相关性和垃圾邮件可能性来获得页面排名。我们的直觉是给与相关页面相关的那些页面以很高的声誉,并为链接到垃圾邮件页面的页面提供很高的垃圾邮件可能性。我们介绍了在计算网页的垃圾邮件可能性的先验估计中包括网页内容的新颖性,并传播此信息。我们基于图的算法为图中的每个节点计算两个分数。直观地,这些值根据网页的文本内容及其在图形中的关系来表示网页的好坏(类似于垃圾邮件)。实验结果表明,我们的方法优于其他垃圾邮件检测技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号