【24h】

Realistic Traffic Generation for Web Robots

机译:Web机器人的现实流量生成

获取原文

摘要

Critical to evaluating the capacity, scalability, and availability of web systems are realistic web traffic generators. Web traffic generation is a classic research problem, no generator accounts for the characteristics of web robots or crawlers that are now the dominant source of traffic to a web server. Administrators are thus unable to test, stress, and evaluate how their systems perform in the face of ever increasing levels of web robot traffic. To resolve this problem, this paper introduces a novel approach to generate synthetic web robot traffic with high fidelity. It generates traffic that accounts for both the temporal and behavioral qualities of robot traffic by statistical and Bayesian models that are fitted to the properties of robot traffic seen in web logs from North America and Europe. We evaluate our traffic generator by comparing the characteristics of generated traffic to those of the original data. We look at session arrival rates, inter-arrival times and session lengths, comparing and contrasting them between generated and real traffic. Finally, we show that our generated traffic affects cache performance similarly to actual traffic, using the common LRU and LFU eviction policies.
机译:实际的Web流量生成器对于评估Web系统的容量,可伸缩性和可用性至关重要。 Web流量生成是一个经典的研究问题,没有生成器能够说明Web机器人或搜寻器的特征,这些特征现在已成为Web服务器流量的主要来源。因此,面对不断增长的Web机械手流量,管理员无法测试,强调和评估其系统的性能。为了解决这个问题,本文介绍了一种新颖的方法来生成具有高保真度的合成Web机器人流量。它通过统计和贝叶斯模型生成的流量可以同时说明机器人流量的时间和行为质量,该统计量和贝叶斯模型适合于北美和欧洲的Web日志中看到的机器人流量的属性。我们通过将生成的流量的特征与原始数据的特征进行比较来评估流量生成器。我们查看会话到达率,到达间隔时间和会话时长,在生成的流量和实际流量之间进行比较和对比。最后,我们展示了使用通用的LRU和LFU驱逐策略,我们生成的流量对缓存性能的影响与实际流量相似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号