首页> 外文会议>IEEE International Congress on Big Data >Spark deployment and performance evaluation on the MareNostrum supercomputer
【24h】

Spark deployment and performance evaluation on the MareNostrum supercomputer

机译:MareNostrum超级计算机上的Spark部署和性能评估

获取原文

摘要

In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalabilityof the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for fine-tuning data-intensive application on large clusters emphasizing on parallelism configurations.
机译:在本文中,我们提出了一个框架,该框架可在MareNostrum上启用数据密集型Spark工作负载,MareNostrum是一种专为计算密集型应用程序设计的千万亿次超级计算机。据我们所知,这是首次尝试在petascale HPC设置上研究Spark的优化部署配置。我们详细介绍了框架的设计,并提供了一些基准数据以提供对系统可伸缩性的见解。我们研究了包括并行性,存储和网络替代方案在内的不同配置的影响,并讨论了在基于以计算为中心的范例的计算系统上执行大数据工作负载的几个方面。此外,我们得出结论,旨在为针对大型集群上的数据密集型应用程序的系统优化优化方法铺平道路,重点是并行配置。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号