有效的爬行Ajax页面的网络爬行算法

李华波; 吴礼发; 赖海光; 郑成辉; 黄康宇

首页> 中文期刊> 《电子科技大学学报》 >有效的爬行Ajax页面的网络爬行算法

有效的爬行Ajax页面的网络爬行算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The generation of Ajax web pages and the Ajax page navigation must execute the client JavaScript, thus it is impossible to extract the complete content of an Ajax page through the traditional crawling algorithms. In this paper, the working mode of Ajax is analyzed, the problem of crawling Ajax web pages is elaborated, and an effective algorithm for crawling Ajax pages is proposed. The algorithm can realize the dynamic generation of Ajax web contents in client browser and the navigation of Ajax web pages, and also it can assign identification number for the crawled pages whose static pages can be generated. Experimental result shows that the number of Ajax pages crawled by the proposed algorithm is obvious bigger than the traditional ones’, and the presented replicas-detecting policies can effectively reduce the time consumption of the algorithm.%　　Ajax页面的生成和页面导航需要执行客户端的JavaScript代码，传统网络爬行算法无法获取Ajax页面全部内容。分析了Ajax的工作方式，阐述了爬行Ajax网页所面临的主要问题，提出并实现了一种有效爬行Ajax页面的网络爬行算法。该算法可控制客户端浏览器动态生成页面内容和完成页面导航，为爬行过的页面分配标识编号并生成相应静态页面。实验结果表明，提出的算法所爬行的Ajax页面数量明显多于传统方法，同时，采用的双重消重策略可有效减少算法的时间耗费。

著录项

来源
《电子科技大学学报》 |2013年第1期|115-120|共6页
作者
李华波; 吴礼发; 赖海光; 郑成辉; 黄康宇;
展开▼
作者单位

解放军理工大学指挥信息系统学院南京 210007;

解放军理工大学指挥信息系统学院南京 210007;

解放军理工大学指挥信息系统学院南京 210007;

解放军理工大学指挥信息系统学院南京 210007;

解放军理工大学指挥信息系统学院南京 210007;

展开▼
原文格式 PDF
正文语种 chi
中图分类 TP393.08;
关键词
Ajax; 爬行算法; 消重策略; 搜索引擎;

相似文献

中文文献
外文文献
专利

1. 基于AJAX应用程序的爬行测试算法 [J] . 高秀慧 ,高建华 . 计算机工程与设计 . 2014,第002期
2. 一种基于状态转换图的Ajax爬行算法 [J] . 郭浩 ,陆余良 ,刘金红 . 计算机应用研究 . 2009,第011期
3. 聚焦爬行中网页爬行算法的改进 [J] . 谭骏珊 ,陈可钦 . 电脑知识与技术 . 2008,第035期
4. 一种使用文档对象模型的AJAX爬行方案 [J] . 张雪松 ,王鸿磊 . 河北软件职业技术学院学报 . 2014,第002期
5. Web页面爬行实践——．NET下正则表达式的应用 [J] . 王辉 . 程序员 . 2004,第009期
6. 基于主题相似度指导网络蜘蛛穿越隧道的爬行算法 [C] . 陈小海 ,周娅 . 2009年全国理论计算机科学学术年会 . 2009
7. 面向AJAX脚本网络的网页爬行及解析技术的研究与实现 [A] . 张瑶 . 2012

有效的爬行Ajax页面的网络爬行算法

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅