首页> 外文会议>IEEE/ACM International Conference on Big Data Computing Applications and Technologies >Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models
【24h】

Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models

机译:Spark-DIY:具有高性能基于块的数据模型的可互操作Spark运算的框架

获取原文

摘要

Today's scientific applications are increasingly relying on a variety of data sources, storage facilities, and computing infrastructures, and there is a growing demand for data analysis and visualization for these applications. In this context, exploiting Big Data frameworks for scientific computing is an opportunity to incorporate high-level libraries, platforms, and algorithms for machine learning, graph processing, and streaming; inherit their data awareness and fault-tolerance; and increase productivity. Nevertheless, limitations exist when Big Data platforms are integrated with an HPC environment, namely poor scalability, severe memory overhead, and huge development effort. This paper focuses on a popular Big Data framework -Apache Spark- and proposes an architecture to support the integration of highly scalable MPI block-based data models and communication patterns with a map-reduce-based programming model. The resulting platform preserves the data abstraction and programming interface of Spark, without conducting any changes in the framework, but allows the user to delegate operations to the MPI layer. The evaluation of our prototype shows that our approach integrates Spark and MPI efficiently at scale, so end users can take advantage of the productivity facilitated by the rich ecosystem of high-level Big Data tools and libraries based on Spark, without compromising efficiency and scalability.
机译:当今的科学应用越来越依赖各种数据源,存储设施和计算基础架构,并且对这些应用的数据分析和可视化的需求也越来越大。在这种情况下,利用大数据框架进行科学计算是将高级库,平台和算法纳入机器学习,图形处理和流传输的机会。继承他们的数据意识和容错能力;并提高生产率。但是,将大数据平台与HPC环境集成时,存在局限性,即可伸缩性差,内存开销大以及开发工作量大。本文关注于一个流行的大数据框架Apache Spark,并提出了一种架构,以支持将高度可扩展的基于MPI块的数据模型和通信模式与基于映射减少的编程模型进行集成。最终平台保留了Spark的数据抽象和编程接口,而无需在框架中进行任何更改,但允许用户将操作委派给MPI层。对我们的原型的评估表明,我们的方法有效地大规模集成了Spark和MPI,因此最终用户可以利用基于Spark的高级大数据工具和库的丰富生态系统所带来的生产力,而不会影响效率和可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号