【24h】

Adaptive Focused Crawling of Linked Data

机译:连接数据的自适应聚焦爬网

获取原文

摘要

Given the evolution of publicly available Linked Data, crawling and preservation have become increasingly important challenges. Due to the scale of available data on the Web, efficient focused crawling approaches which are able to capture the relevant semantic neighborhood of seed entities are required. Here, determining relevant entities for a given set of seed entities is a crucial problem. While the weight of seeds within a seed list vary significantly with respect to the crawl intent, we argue that an adaptive crawler is required, which considers such characteristics when configuring the crawling and relevance detection approach. To address this problem, we introduce a crawling configuration, which considers seed list-specific features as part of its crawling and ranking algorithm. We evaluate it through extensive experiments in comparison to a number of baseline methods and crawling parameters. We demonstrate that, configurations which consider seed list features outperform the baselines and present further insights gained from our experiments.
机译:鉴于公开可用的数据的演变,爬行和保护已经变得越来越重要。由于网络上的可用数据的规模,需要能够捕获种子实体相关语义邻域的有效聚焦爬网方法。这里,确定给定种子实体的相关实体是一个至关重要的问题。虽然种子列表中的种子的重量相对于爬行意图而变化显着变化,但我们认为需要一种自适应履带,这在配置爬网和相关检测方法时考虑了这些特性。为了解决这个问题,我们介绍了一种爬网配置,其认为种子列表特定的特征是其爬网和排名算法的一部分。与许多基线方法和爬行参数相比,我们通过广泛的实验进行评估。我们证明,考虑种子列表特征的配置优于基线,并提供我们实验中获得的进一步洞察力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号