Maximum Sustainable throughput Prediction for Data Stream Processing over Public Clouds

机译：公共云上数据流处理的最大可持续吞吐量预测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In cloud-based stream processing services, the maximum sustainable throughput (MST) is defined as the maximum throughput that a system composed of a fixed number of virtual machines (VMs) can ingest indefinitely. If the incoming data rate exceeds the system's MST, unprocessed data accumulates, eventually making the system inoperable. Thus, it is important for the service provider to keep the MST always larger than the incoming data rate by dynamically changing the number of VMs used by the system. In this paper, we identify a common data processing environment used by modern data stream processing systems, and we propose MST prediction models for this environment. We train the models using linear regression with samples obtained from a few VMs and predict MST for a larger number of VMs. To minimize the time and cost for model training, we statistically determine a set of training samples using Intel's Storm benchmarks with representative resource usage patterns. Using typical use-case benchmarks on Amazon's EC2 public cloud, our experiments show that, training with up to 8 VMs, we can predict MST for streaming applications with less than 4% average prediction error for 12 VMs, 9% for 16 VMs, and 32% for 24 VMs. Further, we evaluate our prediction models with simulation based elastic VM scheduling on a realistic workload. These simulation results show that with 10% over provisioning, our proposed models' cost efficiency is on par with the cost of an optimal scaling policy without incurring any service level agreement violations.

机译：在基于云的流处理服务中，最大可持续吞吐量（MST）定义为由固定数量的虚拟机（VM）组成的系统可以无限期摄取的最大吞吐量。如果传入数据速率超过系统的MST，则会累积未处理的数据，最终使系统无法运行。因此，对于服务提供商而言，重要的是通过动态更改系统使用的VM数量，使MST始终大于输入数据速率。在本文中，我们确定了现代数据流处理系统使用的通用数据处理环境，并提出了针对该环境的MST预测模型。我们使用线性回归对从几个虚拟机中获取的样本进行训练，并预测大量虚拟机的MST。为了最大程度地减少模型训练的时间和成本，我们使用具有代表性资源使用模式的英特尔Storm基准，从统计角度确定了一组训练样本。通过使用Amazon EC2公共云上的典型用例基准，我们的实验表明，在最多8个VM的情况下，我们可以预测流应用程序的MST，其中12个VM的平均预测误差小于4％，16个VM的平均预测误差小于9％，对于24个VM，则为32％。此外，我们在实际工作负载上使用基于仿真的弹性VM调度来评估我们的预测模型。这些模拟结果表明，在超额配置10％的情况下，我们提出的模型的成本效率与最佳扩展策略的成本相当，而不会发生任何服务级别协议违规的情况。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2017年|504-513|共10页
会议地点
作者
Shigeru Imai; Stacy Patterson; Carlos A. Varela;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Throughput; Data processing; Predictive models; Storms; Data models; Computational modeling; Cloud computing;

机译：吞吐量;数据处理;预测模型;风暴;数据模型;计算模型;云计算;

相似文献

外文文献
中文文献
专利

1. Maximum Sustainable Throughput Evaluation Using an Adaptive Method for Stream Processing Platforms [J] . Chu Zheng, Yu Jiong, Hamdulla Askar Quality Control, Transactions . 2020,第期

机译：使用流处理平台的自适应方法最大可持续吞吐量评估
2. Throughput optimization for Storm-based processing of stream data on clouds [J] . Huiyan Cao, Chase Q. Wu, Liang Bao, Future generation computer systems . 2020,第Nova期

机译：对云中流数据的基于Storm的吞吐量优化
3. LPG-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network [J] . Chu Zheng, Yu Jiong, Hamdulla Askar Information Sciences: An International Journal . 2020,第期

机译：LPG-Model：流处理中吞吐量预测的新型模型，使用轻梯度升压机，增量主成分分析和深门控复发单元网络
4. Maximum Sustainable throughput Prediction for Data Stream Processing over Public Clouds [C] . Shigeru Imai, Stacy Patterson, Carlos A. Varela IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2017

机译：公共云数据流处理的最大可持续吞吐量预测
5. Towards Scalable, Cloud Based, Confidential Data Stream Processing [D] . Thoma, Cory. 2019

机译：朝向可扩展，基于云，机密数据流处理
6. Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing [O] . Hao Li, Di Yu, Anand Kumar, -1

机译：CUDA流中的性能建模-高通量数据处理的一种手段
7. Throughput prediction based on ExtraTree for stream processing tasks [O] . Zheng Chu, Jiong Yu, Askar Hamdulla 2021

机译：基于流处理任务的吞吐量预测

Maximum Sustainable throughput Prediction for Data Stream Processing over Public Clouds

摘要

著录项

相似文献

相关主题

期刊订阅