【24h】

Filtering Spamming Pages Utilizing a Modified PageRank Algorithm

机译:使用改进的PageRank算法过滤垃圾邮件页面

获取原文
获取原文并翻译 | 示例

摘要

The World Wide Web has become an entrenched global medium for storing and searching information. Most people begin at a Web search engine to find information, but the user's pertinent search results are often greatly diluted by irrelevant data or sometimes appear on target but still mislead the user in an unwanted direction. One of the intentional, sometimes vicious manipulations of Web databases is a spamming page like Google bombing that is based on the PageRank algorithm, one of many Web structure mining techniques. In this paper, we regard the World Wide Web as a directed labeled graph that Web pages represent nodes and link edges. In the present work, we define the label of an edge as having a link context and a similarity measure between link context and target page. With this similarity, we can modify the transition matrix of the PageRank algorithm. By suggesting a motivating example, it is explained how our proposed algorithm can filter the Web spamming pages effectively.
机译:万维网已经成为存储和搜索信息的根深蒂固的全球媒介。大多数人都是从Web搜索引擎开始查找信息的,但是用户的相关搜索结果通常会被不相关的数据大大稀释,或者有时会出现在目标上,但仍会误导用户误入歧途。对Web数据库的有意操纵,有时甚至是恶意的操纵之一是基于Google轰炸的垃圾邮件页面,该页面基于PageRank算法,这是许多Web结构挖掘技术之一。在本文中,我们将万维网视为带标签的图形,其中Web页面表示节点和链接边缘。在本工作中,我们将边的标签定义为具有链接上下文以及链接上下文和目标页面之间的相似性度量。与此相似,我们可以修改PageRank算法的转换矩阵。通过给出一个激励性的示例,可以说明我们提出的算法如何有效过滤Web垃圾邮件页面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号