首页> 外文会议>Workshop on Intelligent Information Technology Application >The Design and Implementation of a Topic-Driven Crawler
【24h】

The Design and Implementation of a Topic-Driven Crawler

机译:主题驱动履带的设计与实现

获取原文

摘要

It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. As a result, topic-driven crawlers are becoming important tools to support applications such as specialized web portals, online searching, and competitive intelligence. This paper presents a topic-driven crawler computing the degree of relevance and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. This paper also gives a kind of comparatively ideal system architecture and the relationship of each module of a topic-driven crawler, and describes several modules on the details.
机译:在互联网上冲浪的用户可以将网页分类为适当的网页,尽可能正确。因此,主题驱动的爬虫正在成为支持专业网络门户网站,在线搜索和竞争智能等应用的重要工具。本文介绍了一个主题驱动的爬网程序,计算使用术语频率/文档频率,熵和编译规则的相关网页的相关性和炼制相关网页的程度。本文还提供了一种相对理想的系统架构和主题驱动爬虫的每个模块的关系,并描述了细节上的多个模块。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号