首页> 外文会议>International conference on web information systems engineering >PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications
【24h】

PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications

机译:PDist-RIA爬网程序:用于富Internet应用程序的对等分布式爬网程序

获取原文

摘要

Crawling Rich Internet Applications (RIAs) is important to ensure their security, accessibility and to index them for searching. To crawl a RIA, the crawler has to reach every application state and execute every application event. On a large RIA, this operation takes a long time. Previously published GDist-RIA Crawler proposes a distributed architecture to parallelize the task of crawling RIAs, and run the crawl over multiple computers to reduce time. In GDist-RIA Crawler, a centralized unit calculates the next task to execute, and tasks are dispatched to worker nodes for execution. This architecture is not scalable due to the centralized unit which is bound to become a bottleneck as the number of nodes increases. This paper extends GDist-RIA Crawler and proposes a fully peer-to-peer and scalable architecture to crawl RIAs, called PDist-RIA Crawler. PDist-RIA doesn't have the same limitations in terms scalability while matching the performance of GDist-RIA. We describe a prototype showing the scalability and performance of the proposed solution.
机译:爬行富互联网应用程序(RIA)对于确保其安全性,可访问性和为搜索建立索引非常重要。要对RIA进行爬网,爬网程序必须到达每个应用程序状态并执行每个应用程序事件。在大型RIA上,此操作需要很长时间。先前发布的GDist-RIA搜寻器提出了一种分布式体系结构,以并行化搜寻RIA的任务,并在多台计算机上运行搜寻以减少时间。在GDist-RIA爬网程序中,集中式单元计算下一个要执行的任务,然后将任务分派到工作程序节点以执行。由于集中单元必然会成为瓶颈,因为节点数量增加,因此该体系结构无法扩展。本文扩展了GDist-RIA爬网程序,并提出了一种完全的对等和可扩展的架构来爬网RIA,称为PDist-RIA爬网程序。在与GDist-RIA的性能相匹配的同时,PDist-RIA在可伸缩性方面没有相同的限制。我们描述了一个原型,该原型显示了所提出解决方案的可伸缩性和性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号