Indexing Rich Internet Applications Using Components-Based Crawling

机译：使用基于组件的爬网索引富Internet应用程序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with "real-life", complex RIAs, because the size of the produced model is much too large to be practical. In this paper, we propose a new method to crawl AJAX-based RIAs in an efficient manner by detecting "components", which are areas of the DOM that are independent from each other, and by crawling each component separately. This leads to a dramatic reduction of the required state space for the model, without loss of content coverage. Our method does not require prior knowledge of the RIA nor predefined definition of components. Instead, we infer the components by observing the behavior of the RIA during crawling. Our experimental results show that our method can index quickly and completely industrial RIAs that are simply out of reach for traditional methods.

机译：富Internet应用程序（RIA）的自动爬网是一个挑战，因为客户端代码会动态修改客户端，从而异步获取服务器端数据。大多数现有解决方案将RIA建模为状态机，以DOM作为状态，将JavaScript事件执行作为过渡。当与“现实生活”的复杂RIA一起使用时，此方法会失败，因为生成的模型的大小太大而无法实用。在本文中，我们提出了一种新方法，即通过检测“组件”（它们是相互独立的DOM区域），并分别对每个组件进行爬网，以有效的方式对基于AJAX的RIA进行爬网。这导致模型所需的状态空间大大减少，而不会损失内容覆盖率。我们的方法不需要RIA的先验知识，也不需要组件的预定义。相反，我们通过在爬网期间观察RIA的行为来推断组件。我们的实验结果表明，我们的方法可以快速，完整地索引工业RIA，而传统RIA根本无法做到这一点。

著录项

来源
《International conference on web engineering》|2014年|200-217|共18页
会议地点
作者
Ali Moosavi; Salman Hooshmand; Sara Baghbanzadeh; Guy-Vincent Jourdan; Gregor V. Bochmann; Iosif Viorel Onut;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Rich Internet Applications; Web Crawling; Web Application Modeling;

机译：丰富的Internet应用程序;网络爬行; Web应用程序建模;

相似文献

外文文献
中文文献
专利

1. MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: 'MENU' AND 'PROBABILITY' MODELS [J] . SURYAKANT CHOUDHARY, EMRE DINCTURK, SEYED MIRTAHERI, Journal of web engineering . 2014,第3a4期

机译：基于模型的富互联网应用程序抓取：“菜单”和“概率”模型
2. A Model-Based Approach for Crawling Rich Internet Applications [J] . MUSTAFA EMRE DINCTURK, GUY-VINCENT JOURDAN, GREGOR V. BOCHMANN, ACM transactions on the web . 2014,第3期

机译：基于模型的爬网富Internet应用程序
3. Rich Internet Applications:Richer platforms,Richer targets [J] . Larry Seltzer eWeek . 2010,第15期

机译：富互联网应用：Richer平台，Richer目标
4. Indexing Rich Internet Applications Using Components-Based Crawling [C] . Ali Moosavi, Salman Hooshmand, Sara Baghbanzadeh, International Conference on Web Engineering . 2014

机译：使用基于组件的爬网索引丰富的Internet应用程序
5. Model-based Crawling - An Approach to Design Efficient Crawling Strategies for Rich Internet Applications. [D] . Dincturk, Mustafa Emre. 2013

机译：基于模型的爬网-一种为富Internet应用程序设计有效的爬网策略的方法。
6. A rich internet application for remote visualization and collaborative annotation of digital slides in histology and cytology [O] . Raphaël Marée, Benjamin Stévens, Loïc Rollus, 2013

机译：丰富的Internet应用程序用于组织学和细胞学中的数字幻灯片的远程可视化和协作注释
7. Indexing Rich Internet Applications Using Components-Based Crawling [O] . Ali Moosavi, Salman Hooshmand, Sara Baghbanzadeh, 2014

机译：使用基于组件的爬网索引富Internet应用程序

Indexing Rich Internet Applications Using Components-Based Crawling

摘要

著录项

相似文献

相关主题

期刊订阅