【24h】

An Ontology-Based Focused Crawler

机译:基于本体的聚焦履带

获取原文

摘要

In this paper we present a novel approach for building a focused crawler. The goal of our crawler is to effectively identify web pages that relate to a set of pre-defined topics and download them regardless of their web topology or connectivity with other popular pages on the web. The main challenges that we address in our study are: (i) how to effectively identify the pages' topical content before these are fully downloaded and processed and (ii) how to obtain a well-balanced set of training examples that the crawler will regularly consult in its subsequent web visits.
机译:在本文中,我们提出了一种建立一个聚焦履带的新方法。我们的履历的目标是有效地识别与一组预定义主题相关的网页,无论其网上拓扑或与Web上的其他流行页面如何连接。我们研究中的主要挑战是:(i)如何在完全下载和处理之前有效地识别页面的局部内容,并如何获得履带将定期的普通培训示例集合在其随后的网站访问中咨询。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号