首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers
【24h】

Exploiting Analytics Shipping with Virtualized MapReduce on HPC Backend Storage Servers

机译:在HPC后端存储服务器上使用虚拟化MapReduce利用Analytics(分析)传送

获取原文
获取原文并翻译 | 示例
           

摘要

Large-scale scientific applications on High-Performance Computing (HPC) systems are generating a colossal amount of data that need to be analyzed in a timely manner for new knowledge, but are too costly to transfer due to their sheer size. Many HPC systems have catered to in situ analytics solutions that can analyze temporary datasets as they are generated, i.e., without storing to long-term storage media. However, there is still an open question on how to conduct efficient analytics of permanent datasets that have been stored to the backend persistent storage because of their long-term value. To fill the void, we exploit the analytics shipping model for fast analysis of large-scale scientific datasets on HPC backend storage servers. Through an efficient integration of MapReduce and the popular Lustre storage system, we have developed a Virtualized Analytics Shipping (VAS) framework that can ship MapReduce programs to Lustre storage servers. The VAS framework includes three component techniques: (a) virtualized analytics shipping with fast network and disk I/O; (b) stripe-aligned data distribution and task scheduling and (c) pipelined intermediate data merging and reducing. The first technique provides necessary isolation between MapReduce analytics and Lustre I/O services. The second and third techniques optimize MapReduce on Lustre and avoid explicit shuffling. Our performance evaluation demonstrates that VAS offers an exemplary implementation of analytics shipping and delivers fast and virtualized MapReduce programs on backend Lustre storage servers.
机译:高性能计算(HPC)系统上的大规模科学应用程序正在生成大量数据,需要及时对其进行分析以获取新知识,但由于其庞大的规模,传输成本太高。许多HPC系统已经迎合了原位分析解决方案,该解决方案可以在生成临时数据集时对其进行分析,即无需存储到长期存储介质中。但是,由于其长期价值,如何对已存储到后端持久性存储中的永久数据集进行有效分析仍然存在一个悬而未决的问题。为了填补空白,我们利用分析交付模型对HPC后端存储服务器上的大规模科学数据集进行快速分析。通过有效地集成MapReduce和流行的Luster存储系统,我们开发了一个虚拟化分析运送(VAS)框架,可以将MapReduce程序运送到Luster存储服务器。 VAS框架包括三个组成部分技术:(a)具有快速网络和磁盘I / O的虚拟化分析交付; (b)条带对齐的数据分发和任务调度,以及(c)流水线化的中间数据合并和减少。第一种技术提供了MapReduce分析和Luster I / O服务之间的必要隔离。第二和第三种技术优化了Lustre上的MapReduce并避免了显式改组。我们的性能评估表明,VAS提供了示例性的分析交付实现,并在后端Luster存储服务器上提供了快速,虚拟的MapReduce程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号