PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications

机译：PDist-RIA爬网程序：用于富Internet应用程序的对等分布式爬网程序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Crawling Rich Internet Applications (RIAs) is important to ensure their security, accessibility and to index them for searching. To crawl a RIA, the crawler has to reach every application state and execute every application event. On a large RIA, this operation takes a long time. Previously published GDist-RIA Crawler proposes a distributed architecture to parallelize the task of crawling RIAs, and run the crawl over multiple computers to reduce time. In GDist-RIA Crawler, a centralized unit calculates the next task to execute, and tasks are dispatched to worker nodes for execution. This architecture is not scalable due to the centralized unit which is bound to become a bottleneck as the number of nodes increases. This paper extends GDist-RIA Crawler and proposes a fully peer-to-peer and scalable architecture to crawl RIAs, called PDist-RIA Crawler. PDist-RIA doesn't have the same limitations in terms scalability while matching the performance of GDist-RIA. We describe a prototype showing the scalability and performance of the proposed solution.

机译：爬行富互联网应用程序（RIA）对于确保其安全性，可访问性和为搜索建立索引非常重要。要对RIA进行爬网，爬网程序必须到达每个应用程序状态并执行每个应用程序事件。在大型RIA上，此操作需要很长时间。先前发布的GDist-RIA搜寻器提出了一种分布式体系结构，以并行化搜寻RIA的任务，并在多台计算机上运行搜寻以减少时间。在GDist-RIA爬网程序中，集中式单元计算下一个要执行的任务，然后将任务分派到工作程序节点以执行。由于集中单元必然会成为瓶颈，因为节点数量增加，因此该体系结构无法扩展。本文扩展了GDist-RIA爬网程序，并提出了一种完全的对等和可扩展的架构来爬网RIA，称为PDist-RIA爬网程序。在与GDist-RIA的性能相匹配的同时，PDist-RIA在可伸缩性方面没有相同的限制。我们描述了一个原型，该原型显示了所提出解决方案的可伸缩性和性能。

著录项

来源
《International conference on web information systems engineering》|2014年|365-380|共16页
会议地点
作者
Seyed M. Mirtaheri; Gregor V. Bochmann; Guy-Vincent Jourdan; Iosif Viorel Onut;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Web Crawling; Rich Internet Application; Peer-to-Peer Algorithm; Crawling Strategies;

机译：网络爬行;丰富的互联网应用;点对点算法;检索策略;

相似文献

外文文献
中文文献
专利

1. Measuring Peer-to-Peer Network Topology through Geo-Location-Aware Distributed Crawlers [J] . Pratama PUTRA, Akihiro NAKAO 電子情報通信学会技術研究報告 . 2009,第228期

机译：通过地理位置感知的分布式爬网程序测量对等网络拓扑
2. Measuring Peer-to-Peer Network Topology through Geo-Location-Aware Distributed Crawlers [J] . Pratama PUTRA, Akihiro NAKAO 電子情報通信学会技術研究報告. ネットワ-クシステム. Network Systems . 2009,第228期

机译：通过地理位置感知的分布式爬网程序测量对等网络拓扑
3. Measuring Peer-to-Peer Network Topology through Geo-Location-Aware Distributed Crawlers [J] . Pratama PUTRA, Akihiro NAKAO 電子情報通信学会技術研究報告. ネットワ-クシステム. Network Systems . 2009,第228期

机译：通过地理位置感知分布式爬虫测量点对点网络拓扑
4. PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications [C] . Seyed M. Mirtaheri, Gregor V. Bochmann, Guy-Vincent Jourdan, WISE 2014 . 2014

机译：PDIST-RIA履带：用于富有互联网应用的点对点分布式履带
5. An efficient scheme to remove crawler traffic from the Internet. [D] . Yuan, Xiaoqin. 2002

机译：一种从Internet删除爬网程序流量的有效方案。
6. Quality Control and Pre-Analysis Treatment of the Environmental Datasets Collected by an Internet Operated Deep-Sea Crawler during Its Entire 7-Year Long Deployment (2009–2016) [O] . Damianos Chatzievangelou, Jacopo Aguzzi, Martin Scherwath, 2020

机译：互联网操作的深海爬虫在整个长达7年的部署期间（2009-2016年）收集的环境数据集的质量控制和分析前处理
7. A DHT-based Peer-to-peer Architecture for Distributed Internet Applications [O] . Cirani Simone 2011

机译：用于分布式Internet应用程序的基于DHT的对等体系结构
8. Pipe crawlers: Versatile adaptations for real applications. [R] . Hapstack, M., Talarek, T. R. 1990

机译：管道爬行器：适用于实际应用的多功能调整。

PDist-RIA Crawler: A Peer-to-Peer Distributed Crawler for Rich Internet Applications

摘要

著录项

相似文献

相关主题

期刊订阅