首页> 外文会议>International Conference on Communication Systems Software and Middleware >Characterizing the Web Using a New Uniform Sampling Approach
【24h】

Characterizing the Web Using a New Uniform Sampling Approach

机译:使用新的统一采样方法表征网页

获取原文

摘要

Web is one the biggest source of information for many. It is also increasingly growing. For easier use of the Web, Web search engines (WSEs) are being used frequently. However, there is little information about the characteristics of the Web and also WSEs. One usual way to analysis these characteristics is to use a uniform sample. In such approaches, instead of working on the entire Web we can work on a small subset of the Web representing entire Web. In this paper, we propose a new method, called Bucket-Based Sampling (BBS), to gather this small but uniform subset of the Web. The analyses show that BBS improves the samples' uniformity, at least 6.95% respecting PAGERANK-SMP, one of the best existing methods. Using samples gathered by BBS, we compare the relative size of seven famous WSEs. We also estimate some important characteristics of the Web. For example we estimate that the size of indexable Web is around 20.14 billion pages.
机译:Web是许多人最大的信息来源。它也越来越多地增长。为了更容易使用Web,频繁使用Web搜索引擎(WSE)。但是,有关Web的特性以及WSE的特征几乎没有信息。一种常用的分析方法是使用均匀的样品。在这种方法中,而不是在整个网络上工作,我们可以在代表整个Web的Web的小型子集上工作。在本文中,我们提出了一种新的方法,称为基于桶的采样(BBS),以收集Web的这个小但均匀的子集。分析表明,BBS改善了样本的均匀性,至少6.95%尊重PageRank-SMP,最好的现有方法之一。使用BBS收集的样品,我们比较七个着名WSE的相对大小。我们还估计了网络的一些重要特征。例如,我们估计可索引Web的大小约为2014亿页。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号