【24h】

Exploiting PageRank at Different Block Level

机译:在不同的块级别利用PageRank

获取原文
获取原文并翻译 | 示例

摘要

In recent years, information retrieval methods focusing on the link analysis have been developed; The PageRank and HITS are two typical ones According to the hierarchical organization of Web pages, we could partition the Web graph into blocks at different level, such as page level, directory level, host level and domain level. On the basis of block, we could analyze the different hyperlinks among pages. Several approaches proposed that the intra-hyperlink in a host maybe less useful in computing the PageRank. However, there are no reports on how concretely the intra- or inter-hyperlink affects the PageRank. Furthermore, based on different block level, inter-hyperlink and in-tra-hyperlink can be two relative concepts. Thus which level should be optimal to distinguish the intra- or inter-hyperlink? And how the ratio set between the intra-hyperlink and inter-hyperlink could ultimately improve performance of the PageRank algorithm? In this paper, we analyze the link distribution at the different block level and evaluate the importance of the intra- and inter-hyperlink to PageRank on the TREC Web Track data set. Experiment shows that, if we set the block at host level and the ratio of the weight between the intra-hyperlink and inter-hyperlink is 1:4, the retrieval could achieve the best performance.
机译:近年来,开发了以链接分析为中心的信息检索方法。 PageRank和HITS是两个典型的例子。根据Web页面的层次结构,我们可以将Web图形划分为不同级别的块,例如页面级别,目录级别,主机级别和域级别。基于块,我们可以分析页面之间的不同超链接。几种方法提出,主机中的超链接在计算PageRank时可能不太有用。但是,没有关于超链接之间或超链接如何影响PageRank的报告。此外,基于不同的块级别,超链接之间和超链接内可以是两个相对的概念。因此,哪个级别应该最适合区分超链接内部或内部?超链接之间和超链接之间设置的比率如何最终改善PageRank算法的性能?在本文中,我们分析了不同块级别的链接分布,并评估了TREC Web Track数据集上到PageRank的超链接的内部和内部链接的重要性。实验表明,如果将块设置为主机级别,并且超链接之间和超链接之间的权重比为1:4,则检索可以获得最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号