TomusBlobs: scalable data-intensive processing on Azure clouds

Costan Alexandru; Tudoran Radu; Antoniu Gabriel; Brasche Goetz

首页> 外文期刊>Concurrency and computation: practice and experience >TomusBlobs: scalable data-intensive processing on Azure clouds

【24h】

TomusBlobs: scalable data-intensive processing on Azure clouds

机译：TomusBlobs：Azure云上可扩展的数据密集型处理

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The emergence of cloud computing has brought the opportunity to use large-scale compute infrastructures for a broader and broader spectrum of applications and users. As the cloud paradigm gets attractive for the ‘elasticity’ in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. As Big Data analysis on clouds becomes more and more relevant in many application areas, enabling high-throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper, we address this challenge at the level of cloud storage. We introduce a concurrency-optimized data storage system (called TomusBlobs), which federates the virtual disks associated to the Virtual Machines running the application code on the cloud. We demonstrate the performance benefits of our solution for efficient data-intensive processing by building an optimized prototype MapReduce framework for Microsoft's Azure cloud platform on the basis of TomusBlobs. Finally, we specifically address the limitations of state-of-the-art MapReduce frameworks for reduce-intensive workloads, by proposing MapIterativeReduce as an extension of the MapReduce model. We validate the aforementioned contributions through large-scale experiments with synthetic benchmarks and with real-world applications on the Azure commercial cloud by using resources distributed across multiple data centers; they demonstrate that our solutions bring substantial benefits to data-intensive applications compared with approaches relying on state-of-the-art cloud object storage. Copyright © 2013 John Wiley & Sons, Ltd.

机译：云计算的出现带来了将大规模计算基础架构用于越来越广泛的应用程序和用户的机会。随着云范例吸引了资源使用和相关成本中的“弹性”（用户只为实际使用的资源付费），云应用仍然遭受云存储服务的高延迟和低性能的困扰。随着对云的大数据分析在许多应用领域中越来越重要，对云数据进行高吞吐量的海量数据处理成为一个关键问题，因为这会影响整体应用程序性能。在本文中，我们在云存储级别解决了这一挑战。我们引入了并发优化的数据存储系统（称为TomusBlobs），该系统联合了与在云上运行应用程序代码的虚拟机关联的虚拟磁盘。通过在TomusBlobs的基础上为Microsoft的Azure云平台构建优化的原型MapReduce框架，我们展示了我们的解决方案对于高效数据密集型处理的性能优势。最后，我们通过提出MapIterativeReduce作为MapReduce模型的扩展，专门解决了最新的MapReduce框架在减少密集型工作负载方面的局限性。通过使用分布在多个数据中心之间的资源，我们通过使用合成基准和Azure商业云上的实际应用程序的大规模实验来验证上述贡献。他们证明，与依赖最新的云对象存储的方法相比，我们的解决方案为数据密集型应用程序带来了实质性的好处。版权所有©2013 John Wiley＆Sons，Ltd.

著录项

来源
《Concurrency and computation: practice and experience》 |2016年第4期|950-976|共27页
作者
Costan Alexandru; Tudoran Radu; Antoniu Gabriel; Brasche Goetz;
展开▼
作者单位

Inria Rennes ‐ Bretagne Atlantique 35042 Rennes France;

Inria Rennes ‐ Bretagne Atlantique 35042 Rennes France;

Inria Rennes ‐ Bretagne Atlantique 35042 Rennes France;

Microsoft Advanced Technology Labs Europe EMIC 52072 Aachen Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
big data; cloud computing; data‐intensive processing; cloud storage; MapReduce; scientific applications; Azure;

机译：大数据;云计算;数据密集处理;云存储;MapReduce;科学应用;Azure;

相似文献

外文文献
中文文献
专利

1. Data-intensive workflow management: for clouds and data-intensive and scalable computing environments [J] . Balint Molnar Computing reviews . 2021,第1期

机译：数据密集型工作流管理：用于云和数据密集型和可伸缩的计算环境
2. Data-intensive workflow management: for clouds and data-intensive and scalable computing environments [J] . Balint Molnar Computing reviews . 2021,第1期

机译：数据密集型工作流管理：用于云和数据密集型和可伸缩的计算环境
3. Microsoft Azure Becomes The Cloud Of Choice For Marico; Helps Increase Data Processing Speed By Over 150% [J] . Dataquest . 2017,第11期

机译：Microsoft Azure成为Marico的首选云；帮助将数据处理速度提高150％以上
4. Rapid Processing of Synthetic Seismograms Using Windows Azure Cloud [C] . Subramanian Vedaprakash, Wang Liqiang, Lee En-Jui, 2nd IEEE International Conference on Cloud Computing Technology and Science . 2010

机译：使用Windows Azure云快速处理合成地震图
5. Towards Scalable, Cloud Based, Confidential Data Stream Processing [D] . Thoma, Cory. 2019

机译：朝向可扩展，基于云，机密数据流处理
6. NanoSPC: a scalable portable cloud compatible viral nanopore metagenomic data processing pipeline [O] . Yifei Xu, Fan Yang-Turner, Denis Volk, 2020

机译：NanoSPC：可扩展便携式云兼容的病毒纳米孔宏基因组学数据处理管道
7. TomusBlobs: Scalable Data-intensive Processing on Azure Clouds [O] . Costan, Alexandru, Tudoran, Radu, Antoniu, Gabriel, 2013

机译：TomusBlobs：Azure云上的可伸缩数据密集型处理
8. Study of Multi-Scale Cloud Processes Over the Tropical Western Pacific Using Cloud-Resolving Models Constrained by Satellite Data. [R] . Dudhia, J., Heymsfield, A., Kuo, Y. H., 2012

机译：利用卫星数据约束的云分辨模型研究热带西太平洋的多尺度云过程。

TomusBlobs: scalable data-intensive processing on Azure clouds

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅