首页> 外文期刊>IEICE Transactions on Information and Systems >Improvements of HITS Algorithms for Spam Links
【24h】

Improvements of HITS Algorithms for Spam Links

机译:垃圾邮件链接的HITS算法的改进

获取原文
获取原文并翻译 | 示例
       

摘要

The HITS algorithm proposed by Kleinberg is one of the representative methods of scoring Web pages by using hyperlinks. In the days when the algorithm was proposed, most of the pages given high score by the algorithm were really related to a given topic, and hence the algorithm could be used to find related pages. However, the algorithm and the variants including Bharat's improved HITS, abbreviated to BHITS, proposed by Bharat and Henzinger cannot be used to find related pages any more on today's Web, due to an increase of spam links. In this paper, we first propose three methods to find "linkfarms," that is, sets of spam links forming a densely connected subgraph of a Web graph. We then present an algorithm, called a trust-score algorithm, to give high scores to pages which are not spam pages with a high probability. Combining the three methods and the trust-score algorithm with BHITS, we obtain several variants of the HITS algorithm. We ascertain by experiments that one of them, named TaN+BHITS using the trust-score algorithm and the method of finding linkfarms by employing name servers, is most suitable for finding related pages on today's Web. Our algorithms take time and memory no more than those required by the original HITS algorithm, and can be executed on a PC with a small amount of main memory.
机译:Kleinberg提出的HITS算法是使用超链接对网页评分的代表方法之一。在提出该算法的时代,该算法给予高分的大多数页面实际上与给定主题相关,因此该算法可用于查找相关页面。但是,由于垃圾邮件链接的增加,由Bharat和Henzinger提出的包括Bharat改进的HITS(缩写为BHITS)的算法及其变体无法再用于当今Web上的相关页面。在本文中,我们首先提出三种方法来查找“链接域”,即形成Web图的密集连接子图的垃圾邮件链接集。然后,我们提出一种称为信任分数算法的算法,以高概率给不是垃圾邮件页面的页面打分。将三种方法和信任分数算法与BHITS结合起来,我们获得了HITS算法的几种变体。我们通过实验确定,其中一种使用信任分数算法命名的TaN + BHITS和通过使用名称服务器查找链接服务器场的方法最适合在当今Web上查找相关页面。我们的算法占用的时间和内存不超过原始HITS算法所需的时间和内存,并且可以在具有少量主内存的PC上执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号