首页> 外文会议>International Conference on Web Engineering >Identifying Websites with Flow Simulation
【24h】

Identifying Websites with Flow Simulation

机译:使用流模拟识别网站

获取原文

摘要

We present in this paper a method to discover the set of webpages contained in a logical website, based on the link structure of the Web graph. Such a method is useful in the context of Web archiving and website importance computation. To identify the boundaries of a website, we combine the use of an online version of the preflow-push algorithm, an algorithm for the maximum flow problem in traffic networks, and of the Markov CLuster (MCL) algorithm. The latter is used on a crawled portion of the Web graph in order to build a seed of initial webpages, a seed which is extended using the former. An experiment on a subsite of the INRIA Website is described.
机译:我们在本文中展示了一种方法,用于根据Web图的链接结构发现逻辑网站中包含的网页集的方法。这种方法在Web归档和网站重要性计算的上下文中是有用的。要确定网站的边界,我们将在线版本的预先推送算法的使用,是交通网络中最大流量问题的算法,以及马尔可夫集群(MCL)算法。后者用于网图的爬行部分,以便构建初始网页的种子,使用前者延伸的种子。描述了对INRIA网站的底座的实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号