首页> 外文期刊>Journal of information and computational science >Focused Crawler Based on Domain Ontology and FCA
【24h】

Focused Crawler Based on Domain Ontology and FCA

机译:基于领域本体和FCA的集中爬虫

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Focused crawler is a web crawler that aims to selectively seeking out web pages which are relevant to a predefined set of crawling topics, instead of searching the whole Web exhaustively. In this paper, we propose an effective focused web crawling method which based on domain ontology and Formal Concept Analysis (FCA). The method construct a core similarity graph based on WordNet and concept relatedness firstly, and then combining with concept lattice knowledge, a Similarity Concept Context Graph (SCCG) is built. On the basis of SCCG, a focused web crawling method which can measure a page's expected relevancy to a given topic and determine which URL should be crawled firstly is proposed. Experimental result shows our approach has higher recall rates than the standard breadth-first approach, the approach with Context Graph (CG) and the approach with Relevancy Context Graph (RCG). In conclusion, the result demonstrates the effectiveness and significance of our approach.
机译:集中式爬虫是一种Web爬虫,其目的是选择性地查找与一组预定的爬网主题相关的网页,而不是穷举搜索整个Web。本文提出了一种基于领域本体和形式概念分析(FCA)的有效的集中式Web爬网方法。该方法首先基于WordNet和概念相关性构建了核心相似图,然后结合概念格知识,构建了相似概念上下文图(SCCG)。基于SCCG,提出了一种集中式Web爬网方法,该方法可以测量页面与给定主题的预期相关性并确定应首先爬网哪个URL。实验结果表明,与标准广度优先方法,使用上下文图(CG)和使用相关上下文图(RCG)的方法相比,我们的方法具有更高的召回率。总之,结果证明了我们方法的有效性和重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号