【24h】

Temporal Evolution of the UK Web

机译:英国网络的时间演变

获取原文

摘要

Recently, a new temporal dataset has been made public: it is made of a series of twelve 100M pages snapshots of the texttt{.uk} domain~cite{BSVLTAG}. The Web graphs of the twelve snapshots have been merged into a single emph{time-aware} graph that provide constant-time access to temporal information. In this paper we present the first statistical analysis performed on this graph, with the goal of checking whether the information contained in the graph is reliable (i.e., whether it depends essentially on appearance and disappearance of pages and links, or on the crawler behaviour). We perform a number of tests that show that the graph is actually reliable, and provide the first public data on the evolution of the Web that use a large scale and a significant diversity in the sites considered.
机译:最近,一个新的时间数据集已经公开:它由texttt {.uk} domain〜cite {BSVLTAG}的一系列十二个100M页面快照组成。十二个快照的Web图已合并到单个emph {time-aware}图中,该图可提供对时间信息的恒定时间访问。在本文中,我们介绍了对该图进行的首次统计分析,目的是检查图中包含的信息是否可靠(即,它是否主要取决于页面和链接的外观和消失情况,或爬网程序的行为) 。我们进行了许多测试,证明该图实际上是可靠的,并提供了有关Web演进的第一批公共数据,这些数据在所考虑的网站中使用了大规模且相当多的内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号