首页> 外文会议>International Conference on Management and Service Science >Topic-specific crawling on the Web with concept context graph based on FCA
【24h】

Topic-specific crawling on the Web with concept context graph based on FCA

机译:具有基于FCA的概念上下文图的Web上的特定主题爬行

获取原文

摘要

Topic-specific crawling is a method which can not crawl all the webpage, but only crawls the web pages which are related to users' interests. The web pages which have high relevancy of the users' interests should be crawled first. The major problem in focused crawling is how to assign proper credits to the unvisited pages the crawling will visit. In this paper, we propose an effective approach using concept context graph based on Formal Concept Analysis to solve this problem. We build a concept lattice with the visited pages, and then use a method of combination of the term to construct our concept context graph based on the upper concept lattice. Our crawler can measure a page's expected relevancy to a given topic and determine the order in which pages should be visited first. An experiment illustrates that the new method is an effective mechanism which have a considerable result.
机译:特定于主题的爬网是一种无法抓取所有网页的方法,但只爬网页面与用户兴趣有关的网页。具有高相关性的用户兴趣的网页应该首先爬网。重点爬行中的主要问题是如何为爬行将访问的不受检测的页面分配适当的信用。在本文中,我们提出了一种基于正式概念分析的概念上下文图的有效方法来解决这个问题。我们用访问的页面构建一个概念格,然后使用术语的组合方法来构建基于上概念格的概念上下文图。我们的履带程序可以对给定主题衡量页面的预期相关性,并确定首先应该访问页面的顺序。实验说明新方法是具有相当长的有效机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号