首页> 外文期刊>Computer Networks >Trawling the Web for emerging cyber-communities
【24h】

Trawling the Web for emerging cyber-communities

机译:在新兴的网络社区中搜寻网络

获取原文
获取原文并翻译 | 示例
       

摘要

The Web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities -- those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
机译:Web包含大量社区-具有共同利益的内容创建者组-每个社区都将自己表现为一组相互链接的Web页面。新组和商业Web目录总共包含大约20,000个这样的社区;我们在此特别关注的是新兴社区-在此类论坛中代表很少或没有代表的社区。本文的主题是从Web爬网系统地枚举超过100,000个这样的新兴社区:我们称我们的过程为拖网。我们鼓励采用图论方法来定位此类社区,并描述找到符合该概念的结构所必需的算法和算法工程,处理如此庞大的数据集所面临的挑战以及我们的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号