首页> 外文会议>IEEE International Conference on Cloud Computing >Performance Analysis of Large-Scale Distributed Stream Processing Systems on the Cloud
【24h】

Performance Analysis of Large-Scale Distributed Stream Processing Systems on the Cloud

机译:云上大型分布式流处理系统的性能分析

获取原文

摘要

Real-time data processing is often a necessity as it can provide insights that have less value if discovered off-line or after the fact. However, large-scale stream processing systems are non-trivial to build and deploy. While there are many frameworks that allow users to create large-scale distributed systems, there remains many challenges in understanding the performance, cost of deployment and considerations and impact of potential (partial) outages on real-time systems performance. Our work considers the performance of Cloud-based stream processing systems in terms of back-pressure and expected utilization. The performance of an exemplar stream application is explored using different Cloud-based virtual machine resources and where the scale of deployment and cost benefits are taken into consideration in relation to the overall performance. To achieve this, we develop an algorithm based on queueing theory to predict the throughput and latency of stream data processing while supporting system stability. Our methodology for making fundamental measurements is applicable to mainstream stream processing frameworks such as Apache Storm and Heron. The method is especially suitable for large-scale distributed stream processing where jobs can run for extended time periods. We benchmark the performance of the system on the national research cloud of Australia (Nectar), and present a performance analysis based on estimating the overall effective utilization.
机译:实时数据处理通常是必需的,因为它可以提供脱机或事后发现的有价值的见解。但是,大型流处理系统的构建和部署并非易事。尽管有许多框架允许用户创建大型分布式系统,但是在理解性能,部署成本以及考虑因素以及潜在(部分)中断对实时系统性能的影响方面,仍然存在许多挑战。我们的工作从反压和预期利用率的角度考虑了基于云的流处理系统的性能。使用不同的基于云的虚拟机资源来探索示例流应用程序的性能,并在考虑整体性能的基础上考虑部署规模和成本优势。为此,我们开发了一种基于排队论的算法来预测流数据处理的吞吐量和延迟,同时支持系统稳定性。我们进行基本测量的方法适用于主流流处理框架,例如Apache Storm和Heron。该方法尤其适用于作业可以运行较长时间的大规模分布式流处理。我们在澳大利亚(Nectar)的国家研究云上对系统的性能进行了基准测试,并在估算总体有效利用率的基础上进行了性能分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号