首页> 外文会议>International Conference on Database Systems for Advanced Applications >Finding a web community by maximum flow algorithm with HITS score based capacity
【24h】

Finding a web community by maximum flow algorithm with HITS score based capacity

机译:通过基于HITS得分的最大流量算法查找网络社区

获取原文

摘要

In this paper we propose an edge capacity based on hub and authority scores, and examine the effects of using the edge capacity on the method for extracting web communities using maximum flow algorithm proposed by G.Flake et al. A web community is a collection of web pages in which a common (or related) topic is taken up. In recent years, various methods for finding web communities have been proposed. G.Flake et al.'s method, which is based on maximum flow algorithm, has a big advantages: "topic drift" does not easily occur. On the other hand, it sets the edge capacity to a fixed value for every edge, which is one of the major cause of failing to obtain a proper web community. Our approach, which is using HITS score based edge capacity effectively extracts web pages retaining well-balanced in both global and local relations to the given seed node. We examined the effects by the experiments for randomly selected 20 topics using web archives in Japan crawled in 2002. The result confirmed that the average precision rose approximately 20%.
机译:在本文中,我们提出了基于集线器和权限分数的边缘容量,并使用G.Flake等人提出的最大流量算法研究使用边缘容量的效果。 Web社区是一系列网页的集合,其中占用了普通(或相关)主题。近年来,已经提出了寻找网络社区的各种方法。 G.Flake等人。基于最大流量算法的方法具有很大的优点:“主题漂移”不容易发生。另一方面,它将边缘容量设置为每个边缘的固定值,这是未能获得适当的Web社区的主要原因之一。我们使用的方法是基于HITS得分的边缘容量有效地提取了在全局和本地关系中保持平衡的网页与给定的种子节点。我们在2002年爬出了日本的Web Archives,我们检查了随机选择的20个主题的实验的影响。结果证实平均精度大约增加了20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号