首页> 外文会议>IEEE International Conference on Semantic Computing >ORCA - a Benchmark for Data Web Crawlers
【24h】

ORCA - a Benchmark for Data Web Crawlers

机译:ORCA - 数据Web爬虫的基准

获取原文
获取外文期刊封面目录资料

摘要

The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://w3id.org/dice-research/orca.
机译:网上可用的RDF知识图数不断增长。为下游应用程序以大规模收集这些图形,因此需要使用爬虫。虽然存在数据Web爬网程序,但是,常规Web爬网程序可以适应专注于数据网络,目前没有基准以相当评估其性能。我们的工作通过呈现ORCA基准来结束这种差距。 ORCA生成一个合成数据Web,它与原始Web解耦并启用数据Web爬虫的公平和可重复的比较。我们的评估表明,ORCA可用于揭示现有爬行者的不同优点和缺点。基准测试是开源,可在https://w3id.org/dice-research/orca上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号