首页> 外文会议>The International Conference on Information Networking 2012 >Design and implementation of web crawler based on dynamic web collection cycle
【24h】

Design and implementation of web crawler based on dynamic web collection cycle

机译:基于动态网页采集周期的网页爬虫的设计与实现

获取原文
获取原文并翻译 | 示例

摘要

The amount of web information is increasing rapidly with advanced wireless networks and emergence of diverse smart devices like i-Phone, i-Pad and so on. The information is continuously being produced and updated in anywhere and anytime by means of easy web platforms, and social networks. Now, it is becoming a hot issue how frequently updated web data has to be refreshed in data integration and retrieval domain. In this paper, we propose dynamic web-data crawling methods, which include sensitive checking of web site changes, and dynamic retrieving of web pages from target web sites. Furthermore, we implemented a java-based web crawling application and compared performance between conventional static approaches and our proposed dynamic ones. Our experiment results showed 59% performance benefits compared to static crawling method
机译:随着先进的无线网络以及诸如i-Phone,i-Pad等各种智能设备的出现,Web信息的数量正在迅速增加。通过便捷的Web平台和社交网络,可以随时随地不断产生和更新信息。现在,如何在数据集成和检索域中频繁刷新更新的Web数据已成为一个热门问题。在本文中,我们提出了动态的Web数据爬网方法,其中包括敏感地检查网站更改以及从目标网站动态检索网页。此外,我们实现了一个基于Java的Web爬网应用程序,并比较了常规静态方法和我们提出的动态方法的性能。我们的实验结果表明,与静态抓取方法相比,性能提高了59%

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号