首页> 外文期刊>Neurocomputing >Webspam demotion: Low complexity node aggregation methods
【24h】

Webspam demotion: Low complexity node aggregation methods

机译:Webspam降级:低复杂度节点聚合方法

获取原文
获取原文并翻译 | 示例
       

摘要

Search engines result pages (SERPs) for a specific query are constructed according to several mechanisms. One of them consists in ranking Web pages regarding their importance, regardless of their semantic. Indeed, relevance to a query is not enough to provide high quality results, and popularity is used to arbitrate between equally relevant Web pages. The most well-known algorithm that ranks Web pages according to their popularity is the PageRank. The term Webspam was coined to denotes Web pages created with the only purpose of fooling ranking algorithms such as the PageRank. Indeed, the goal of Webspam is to promote a target page by increasing its rank. It is an important issue for Web search engines to spot and discard Webspam to provide their users with a nonbiased list of results. Webspam techniques are evolving constantly to remain efficient but most of the time they still consist in creating a specific linking architecture around the target page to increase its rank. In this paper we propose to study the effects of node aggregation on the well-known ranking algorithm of Google (the PageRank) in the presence of Webspam. Our node aggregation methods have the purpose to construct clusters of nodes that are considered as a sole node in the PageRank computation. Since the Web graph is way to big to apply classic clustering techniques, we present four lightweight aggregation techniques suitable for its size. Experimental results on the WEBSPAM-UK2007 dataset show the interest of the approach, which is moreover confirmed by statistical evidence.
机译:特定查询的搜索引擎结果页(SERP)是根据几种机制构建的。其中之一是根据网页的重要性对网页进行排名,而不考虑其语义。实际上,与查询的相关性不足以提供高质量的结果,并且流行度用于在同等相关的Web页面之间进行仲裁。根据网页的流行度对网页进行排名的最著名的算法是PageRank。术语“ Webspam”是用来表示创建网页的唯一目的是欺骗排名算法,例如PageRank。实际上,Webspam的目标是通过提高目标网页的排名来促进目标网页的发展。对于Web搜索引擎来说,发现并丢弃Webspam以向其用户提供不偏不倚的结果列表是一个重要的问题。 Web垃圾邮件技术一直在不断发展以保持效率,但是大多数时候,它们仍然包括围绕目标页面创建特定的链接体系结构以提高其排名。在本文中,我们建议研究在存在Webspam的情况下节点聚合对Google著名的排名算法(PageRank)的影响。我们的节点聚合方法旨在构造在PageRank计算中被视为唯一节点的节点集群。由于Web图可以应用经典的聚类技术,因此我们提出了四种适合其大小的轻量级聚合技术。在WEBSPAM-UK2007数据集上的实验结果表明了该方法的重要性,而且该方法还得到了统计证据的证实。

著录项

  • 来源
    《Neurocomputing》 |2012年第1期|p.105-113|共9页
  • 作者单位

    Univ Paris-Sud, LRI, CNRS. INRIA. Orsay F-91405, France;

    Univ Paris-Sud, LRI, CNRS. INRIA. Orsay F-91405, France;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    webspam demotion; clustering;

    机译:网络垃圾邮件降级;聚类;
  • 入库时间 2022-08-18 02:07:51

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号