首页> 外文会议>Network Traffic Measurement and Analysis Conference >Kraaler: A User-Perspective Web Crawler
【24h】

Kraaler: A User-Perspective Web Crawler

机译:Kraaler:用户透视网络爬虫

获取原文

摘要

Adaption of technologies being used on the web is changing frequently, requiring applications that interact with the web to continuously change their ability to parse it. This has led most web crawlers to either inherent simplistic parsing capabilities, differentiating from web browsers, or use a web browser with high-level interactions that restricts observable information. We introduce Kraaler, an open source universal web crawler that uses the Chrome Debugging Protocol, enabling the use of the Blink browser engine for parsing, while obtaining protocol-level information. The crawler stores information in a database and on the file system and the implementation has been evaluated in a predictable environment to ensure correctness in the collected data. Additionally, it has been evaluated in a real-world scenario, demonstrating the impact of the parsing capabilities for data collection.
机译:Web上使用的技术的适应性变化频繁,需要与Web交互的应用程序不断更改其解析能力。这导致大多数Web爬网程序要么具有固有的简化解析功能(与Web浏览器不同),要么将Web浏览器与具有高级交互作用的Web浏览器一起使用,从而限制了可观察的信息。我们介绍了Kraaler,这是一种开放源代码的通用Web搜寻器,它使用Chrome调试协议,可以在获取协议级信息的同时使用Blink浏览器引擎进行解析。搜寻器将信息存储在数据库和文件系统中,并且已在可预测的环境中对实现进行了评估,以确保所收集数据的正确性。此外,它已经在实际场景中进行了评估,证明了解析功能对数据收集的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号