首页> 外文期刊>Journal of supercomputing >GUIDE: an interactive and incremental approach for crawling Web applications
【24h】

GUIDE: an interactive and incremental approach for crawling Web applications

机译:指南:一种用于爬网Web应用程序的交互式增量方法

获取原文
获取原文并翻译 | 示例
       

摘要

The Internet, having a sea of Web applications, is one of the largest data stores for big data analysis. To explore and retrieve the states (pages) from Web applications, Web crawlers have been extensively used. Most crawlers allow the users to define a few crawling directives so as to increase the coverage of states that the crawler can explore. A directive can, for example, assign an input value to a specified input field so that the application is instructed to perform a specific action and visit some special states. Note that, a crawler is supposedly capable of exploring an unknown application. But, given an unknown application, how could the user possibly prepare the required directives in advance? This paper proposes an interactive crawling approach and a crawler called GUIDE to overcome this issue. Instead of passively receiving directives from the user, GUIDE actively asks the user for directives when Web pages containing input fields are found. In addition, GUIDE offers a hierarchical directive structure, allowing the user to define multiple values for the same input field. A case study with three Web applications indicated that (1) interactive directives were very useful for increasing the code coverage of the application being explored-up to 10.3-50.5% of code coverage improvement can be achieved, and (2) using GUIDE is more efficient than using a traditional crawler-given the same amount of time, up to 11% of code coverage improvement can be achieved.
机译:拥有大量Web应用程序的Internet是用于大数据分析的最大数据存储之一。为了探索和检索Web应用程序中的状态(页面),已广泛使用Web搜寻器。大多数搜寻器都允许用户定义一些搜寻指令,以扩大搜寻器可以探索的状态的覆盖范围。指令可以例如将输入值分配给指定的输入字段,以便指示应用程序执行特定操作并访问某些特殊状态。请注意,爬网程序据说能够浏览未知的应用程序。但是,对于一个未知的应用程序,用户如何可能提前准备所需的指令?本文提出了一种交互式爬网方法和一种称为GUIDE的爬网程序来克服此问题。当找到包含输入字段的网页时,GUIDE不会主动从用户那里接收指令,而是主动向用户询问指令。另外,GUIDE提供了分层的指令结构,允许用户为同一输入字段定义多个值。以三个Web应用程序为例的案例研究表明:(1)交互式指令对于将所探查的应用程序的代码覆盖率提高到非常有用,可以将代码覆盖率提高10.3-50.5%,并且(2)使用GUIDE还可以与使用传统的搜寻器相比,在相同的时间下效率更高,可以实现多达11%的代码覆盖率改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号