首页> 外文会议>4th international workshop on adversarial information retrieval on the web 2008 >Robust PageRank and locally computable spam detection features
【24h】

Robust PageRank and locally computable spam detection features

机译:强大的PageRank和本地可计算的垃圾邮件检测功能

获取原文
获取外文期刊封面目录资料

摘要

Before the advent of the World Wide Web, information retrieval algorithms were developed for relatively small and coherent document collections such as newspaper articles or book catalogs in a library. In comparison to these collections, the Web is massive, much less cohe-rent, changes more rapidly, and is spread over geographically distributed computers. Scal-ing information retrieval algorithms to the World Wide Web is a challenging task. Success to date is depicted by the ubiquitous use of search engines to access Internet content. >From the point of view of a search engine, the Web is a mix of two types of content: the "closed Web" and the "open Web". The closed web comprises a few high-quality controlled collections which a search engine can fully trust. The "open Web," on the other hand, in-cludes the vast majority of Web pages, which lack an authority asserting their quality. The openness of the Web has been the key to its rapid growth and success. However, this open-ness is also a major source of new challenges for information retrieval methods. >Adversarial Information Retrieval addresses tasks such as gathering, indexing, filtering, re-trieving and ranking information from collections wherein a subset has been manipulated maliciously. On the Web, the predominant form of such manipulation is "search engine spamming" or spamdexing, i.e.: malicious attempts to influence the outcome of ranking al-gorithms, aimed at getting an undeserved high ranking for some items in the collection. There is an economic incentive to rank higher in search engines, considering that a good ranking on them is strongly correlated with more traffic, which often translates to more revenue.
机译:在万维网出现之前,信息检索算法是为相对较小和连贯的文档集(例如报纸上的文章或图书馆中的书籍目录)开发的。与这些集合相比,Web规模庞大,功能少得多,更改速度更快,并且散布在地理分布的计算机上。将信息检索算法扩展到万维网是一项艰巨的任务。到目前为止,成功的体现是广泛使用搜索引擎来访问Internet内容。

从搜索引擎的角度来看,Web是两种类型的内容的组合:“封闭式Web”和“开放式网络”。封闭的Web包含一些搜索引擎可以完全信任的高质量受控集合。另一方面,“开放Web”包括绝大多数Web页面,这些Web页面缺乏断言其质量的权限。 Web的开放性一直是其快速增长和成功的关键。但是,这种开放性也是信息检索方法面临新挑战的主要来源。已被恶意操纵。在网络上,这种操纵的主要形式是“搜索引擎垃圾邮件”或垃圾邮件散布,即:恶意尝试影响排名算法的结果,目的是使馆藏中的某些项目获得不应有的高排名。考虑到在搜索引擎上的良好排名与更多流量(通常转化为更多收入)密切相关,因此有一种经济动机鼓励它们在搜索引擎中排名更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号