【24h】

MPLEX: In-Situ Big Data Processing with Compute-Storage Multiplexing

机译:MPLEX:使用计算 - 存储复用的原位大数据处理

获取原文
获取外文期刊封面目录资料

摘要

Cloud-based services are increasingly popular for big data analytics due to the flexibility, scalability, and cost-effectiveness of provisioning elastic resources on-demand. However, data analytics-as-a-service suffers from the overheads of data movement between compute and storage clusters, due to their decoupled architecture in existing cloud infrastructure. In this work, we propose a novel approach of in-situ big data processing on cloud storage by dynamically offloading data-intensive jobs from compute cluster to storage cluster, and improve job throughput. However, it is challenging to achieve this goal since introducing additional workload on the storage cluster can significantly impact interactive web requests that fetch cloud storage data, with strict SLA (service-level agreement) for tail latency. In this work, we present MPLEX, a system that augments data analytics-as-a-service by efficiently multiplexing compute and storage cluster to improve job throughput without violating the SLA of cloud storage service in terms of tail response time. It applies an SLA-aware opportunistic job scheduling technique supported by a machine learning based prediction model to exploit the dynamic workload conditions in the compute, and storage cluster. Performance evaluations on an OpenStack Swift cluster, and an OpenStack based virtual cluster of Hadoop VMs built atop NSFCloud's Chameleon testbed show that MPLEX improves the Hadoop job throughput by up to 1.7X, while maintaining the SLA for cloud storage service requests.
机译:由于灵活性,可伸缩性和供应需求的弹性资源的速度,成本效益,基于云的服务越来越受到大数据分析的流行。然而,由于现有云基础设施的解耦架构,数据分析 - AS-AS-AS-AS-Servers遭受了计算和存储集群之间的数据移动的开销。在这项工作中,通过将计算群集的数据密集型作业动态卸载到存储群集,提出了一种新的云存储原位大数据处理方法,并提高作业吞吐量。但是,实现这一目标是挑战,因为在存储群集中引入额外的工作负载可以显着影响获取云存储数据的交互式Web请求,具有严格的SLA(服务级协议)进行尾延迟。在这项工作中,我们目前通过有效地复用计算和存储群集来增强数据分析 - AS-Service的系统,以改善作业吞吐量,而无需违反尾部响应时间的云存储服务的SLA。它适用于基于机器学习的预测模型支持的SLA感知机会作业调度技术,以利用计算和存储群集中的动态工作负载条件。在OpenStack Swift集群上的性能评估,以及NSFCloud的Chameleon的Chameleon测试机顶op VM的基于OpenStack虚拟群集显示,MPLEX将Hadoop作业吞吐量提高到1.7倍,同时维护云存储服务请求的SLA。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号