首页> 外文期刊>International Journal of Computers & Applications >EFFICIENT BANDWIDTH UTILIZATION FOR DOWNLOADING WEB PAGES
【24h】

EFFICIENT BANDWIDTH UTILIZATION FOR DOWNLOADING WEB PAGES

机译:高效地利用带宽下载网页

获取原文
获取原文并翻译 | 示例
       

摘要

Web crawler is a computer program that browses World Wide Web in methodical and automated manner. Latest crawling techniques in use are parallel crawling and hierarchical crawling. In later case, total Web site is extracted by dividing it into a few levels. The homepage from which crawling process starts is considered to be the first level. All the hyperlinks present on that Web page all together is considered to be the next level and so on. In this crawling process all the Web pages at a single level gets downloaded simultaneously by the creation of multiple crawlers dynamically depending on the number of hyperlinks on that level. But in real-life scenario the bandwidth available is limited and acts as a deterrent in this case. In this paper, a scheduling algorithm has been proposed on the basis of the sizes of the Web pages to make full utilization of the bandwidth available. To achieve this, a modified type of queue (Y-type) is introduced where URLs of the Web pages are kept in an orderly manner and they are released in such a way that the total size of the Web pages issued is closest to the bandwidth available.
机译:Web搜寻器是一种计算机程序,可以有条不紊和自动化地浏览万维网。最新使用的爬网技术是并行爬网和分层爬网。在以后的情况中,通过将整个网站划分为几个级别来提取整个网站。从其开始抓取过程的主页被认为是第一级。该网页上出现的所有超链接一起被认为是下一个级别,依此类推。在此爬网过程中,通过创建多个爬网程序来动态下载单个级别上的所有网页,具体取决于该级别上超链接的数量。但是在现实生活中,可用带宽是有限的,并且在这种情况下起到了威慑作用。在本文中,基于网页的大小提出了一种调度算法,以充分利用可用带宽。为实现此目的,引入了一种修改的队列(Y型),其中网页的URL有序地保存,并以使发布的网页的总大小最接近带宽的方式释放它们。可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号