ORCA - a Benchmark for Data Web Crawlers

机译：ORCA - 数据Web爬虫的基准

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The number of RDF knowledge graphs available on the Web grows constantly. Gathering these graphs at large scale for downstream applications hence requires the use of crawlers. Although Data Web crawlers exist, and general Web crawlers could be adapted to focus on the Data Web, there is currently no benchmark to fairly evaluate their performance. Our work closes this gap by presenting the Orca benchmark. Orca generates a synthetic Data Web, which is decoupled from the original Web and enables a fair and repeatable comparison of Data Web crawlers. Our evaluations show that Orca can be used to reveal the different advantages and disadvantages of existing crawlers. The benchmark is open-source and available at https://w3id.org/dice-research/orca.

机译：网上可用的RDF知识图数不断增长。为下游应用程序以大规模收集这些图形，因此需要使用爬虫。虽然存在数据Web爬网程序，但是，常规Web爬网程序可以适应专注于数据网络，目前没有基准以相当评估其性能。我们的工作通过呈现ORCA基准来结束这种差距。 ORCA生成一个合成数据Web，它与原始Web解耦并启用数据Web爬虫的公平和可重复的比较。我们的评估表明，ORCA可用于揭示现有爬行者的不同优点和缺点。基准测试是开源，可在https://w3id.org/dice-research/orca上获得。

著录项

来源
《IEEE International Conference on Semantic Computing》|2021年|272-279|共8页
会议地点
作者
Michael Röder; Geraldo de Souza; Denis Kuchelev; Abdelmoneim Amer Desouki; Axel-Cyrille Ngonga Ngomo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Protocols; Crawlers; Semantics; Benchmark testing; Resource description framework; Robots; Open source software;

机译：协议;爬虫;语义;基准测试;资源描述框架;机器人;开源软件;

相似文献

外文文献
中文文献
专利

1. WIVET—Benchmarking Coverage Qualities of Web Crawlers [J] . EMIN ĪSLAM TATLI, BEDĪRHAN URGUN The Computer journal . 2017,第4期

机译：WIVET-标定网络爬虫的覆盖范围
2. Web Crawler: Extracting the Web Data [J] . Mini Singh Ahuja, Dr Jatinder Singh Bal, Varnica International Journal of Computer Trends and Technology . 2014,第3期

机译：Web爬网程序：提取Web数据
3. To Whom Do Data Belong?——Data Ownership and Protection in the Context of Web-Crawlers [J] . Ding Xiaodong, Ryan(翻译) 当代社会科学（英文） . 2020,第006期

机译：数据属于谁？-网络爬虫背景下的数据所有权和保护
4. Accuracy Crawler: An Accurate Crawler for Deep Web Data Extraction [C] . Prafful Mishra, Anshul Khurana International Conference on Control, Power, Communication and Computing Technologies . 2018

机译：准确性搜寻器：用于深度Web数据提取的准确搜寻器
5. Design and implementation of an intelligent Web crawler for corporate data scraping. [D] . Qin, Xinfeng. 2007

机译：用于企业数据抓取的智能Web搜寻器的设计和实现。
6. Using Data Crawlers and Semantic Web to Build Financial XBRL Data Generators: The SONAR Extension Approach [O] . Miguel Ángel Rodríguez-García, Alejandro Rodríguez-González, Ricardo Colomo-Palacios, -1

机译：使用数据搜寻器和语义网构建财务XBRL数据生成器：SONAR扩展方法
7. ORCA - a Benchmark for Data Web Crawlers [O] . Michael Roder, Geraldo de Souza, Denis Kuchelev, 2021

机译：ORCA - 数据Web爬虫的基准

ORCA - a Benchmark for Data Web Crawlers

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅