【24h】

Web Crawler for Event-Driven Crawling of AJAX-Based Web Applications

机译:Web爬网程序,用于基于AJAX的Web应用程序的事件驱动爬网

获取原文

摘要

This paper describes a novel technique for crawling Ajax-based applications through "event-driven" crawling in web browsers. The algorithm uses the browser context to analyse the DOM, scans the DOM-tree, detects elements that are capable of changing the state, triggers events on those elements and extracts dynamic DOM content. For illustration, an AJAX web application is utilized as an example to explain the approach. Additionally, the authors implement the concepts and algorithms discussed in this paper in a tool. Finally, the authors report a number of empirical studies in which they apply their approach to a number of representative AJAX applications. The results show that their method has a better performance often with a faster rate of state discovery. The "event-driven" crawling can effectively and accurately crawl dynamic content from Ajax-based applications.
机译:本文介绍了一种通过Web浏览器中的“事件驱动”爬网爬行基于Ajax的应用程序的新技术。该算法使用浏览器上下文来分析DOM,扫描DOM-Tree,检测能够更改状态的元素,触发这些元素的事件并提取动态DOM内容。出于插图,利用Ajax Web应用程序作为解释方法的示例。此外,作者在工具中实现了本文中讨论的概念和算法。最后,作者报告了许多实证研究,它们将其方法应用于许多代表性Ajax应用程序。结果表明,它们的方法通常具有更好的性能,通常具有更快的状态发现率。 “事件驱动”爬网可以有效准确地从基于Ajax的应用程序爬网动态内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号