首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Maximum Sustainable throughput Prediction for Data Stream Processing over Public Clouds
【24h】

Maximum Sustainable throughput Prediction for Data Stream Processing over Public Clouds

机译:公共云上数据流处理的最大可持续吞吐量预测

获取原文

摘要

In cloud-based stream processing services, the maximum sustainable throughput (MST) is defined as the maximum throughput that a system composed of a fixed number of virtual machines (VMs) can ingest indefinitely. If the incoming data rate exceeds the system's MST, unprocessed data accumulates, eventually making the system inoperable. Thus, it is important for the service provider to keep the MST always larger than the incoming data rate by dynamically changing the number of VMs used by the system. In this paper, we identify a common data processing environment used by modern data stream processing systems, and we propose MST prediction models for this environment. We train the models using linear regression with samples obtained from a few VMs and predict MST for a larger number of VMs. To minimize the time and cost for model training, we statistically determine a set of training samples using Intel's Storm benchmarks with representative resource usage patterns. Using typical use-case benchmarks on Amazon's EC2 public cloud, our experiments show that, training with up to 8 VMs, we can predict MST for streaming applications with less than 4% average prediction error for 12 VMs, 9% for 16 VMs, and 32% for 24 VMs. Further, we evaluate our prediction models with simulation based elastic VM scheduling on a realistic workload. These simulation results show that with 10% over provisioning, our proposed models' cost efficiency is on par with the cost of an optimal scaling policy without incurring any service level agreement violations.
机译:在基于云的流处理服务中,最大可持续吞吐量(MST)定义为由固定数量的虚拟机(VM)组成的系统可以无限期摄取的最大吞吐量。如果传入数据速率超过系统的MST,则会累积未处理的数据,最终使系统无法运行。因此,对于服务提供商而言,重要的是通过动态更改系统使用的VM数量,使MST始终大于输入数据速率。在本文中,我们确定了现代数据流处理系统使用的通用数据处理环境,并提出了针对该环境的MST预测模型。我们使用线性回归对从几个虚拟机中获取的样本进行训练,并预测大量虚拟机的MST。为了最大程度地减少模型训练的时间和成本,我们使用具有代表性资源使用模式的英特尔Storm基准,从统计角度确定了一组训练样本。通过使用Amazon EC2公共云上的典型用例基准,我们的实验表明,在最多8个VM的情况下,我们可以预测流应用程序的MST,其中12个VM的平均预测误差小于4%,16个VM的平均预测误差小于9% ,对于24个VM,则为32%。此外,我们在实际工作负载上使用基于仿真的弹性VM调度来评估我们的预测模型。这些模拟结果表明,在超额配置10%的情况下,我们提出的模型的成本效率与最佳扩展策略的成本相当,而不会发生任何服务级别协议违规的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号