首页> 外文会议>International world wide web conference >Trawling the Web for emerging cyber-communities
【24h】

Trawling the Web for emerging cyber-communities

机译:为新兴网络社区拖网网

获取原文

摘要

The Web harbors a large number of communities - groups of content-creators sharing a common interest - each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities - those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
机译:Web Harbors大量社区 - 内容创建者组共享共同兴趣 - 每个人都以一组互连的网页体现。 Newgroups和商业网络目录一起包含20,000个此类社区的订单;在这里,我们的特殊兴趣是新兴社区 - 那些在此类方面几乎没有代表的人。本文的主题是系统枚举超过100,000多个此类新兴社区,从Web爬网中致电我们的流程拖网。我们激励了一个图形 - 定位方法来定位此类社区,并描述算法,以及查找订阅此概念的结构所需的算法工程,处理这种巨大数据集的挑战以及我们的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号