Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models

机译：Spark-DIY：具有高性能基于块的数据模型的可互操作Spark运算的框架

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Today's scientific applications are increasingly relying on a variety of data sources, storage facilities, and computing infrastructures, and there is a growing demand for data analysis and visualization for these applications. In this context, exploiting Big Data frameworks for scientific computing is an opportunity to incorporate high-level libraries, platforms, and algorithms for machine learning, graph processing, and streaming; inherit their data awareness and fault-tolerance; and increase productivity. Nevertheless, limitations exist when Big Data platforms are integrated with an HPC environment, namely poor scalability, severe memory overhead, and huge development effort. This paper focuses on a popular Big Data framework -Apache Spark- and proposes an architecture to support the integration of highly scalable MPI block-based data models and communication patterns with a map-reduce-based programming model. The resulting platform preserves the data abstraction and programming interface of Spark, without conducting any changes in the framework, but allows the user to delegate operations to the MPI layer. The evaluation of our prototype shows that our approach integrates Spark and MPI efficiently at scale, so end users can take advantage of the productivity facilitated by the rich ecosystem of high-level Big Data tools and libraries based on Spark, without compromising efficiency and scalability.

机译：当今的科学应用越来越依赖各种数据源，存储设施和计算基础架构，并且对这些应用的数据分析和可视化的需求也越来越大。在这种情况下，利用大数据框架进行科学计算是将高级库，平台和算法纳入机器学习，图形处理和流传输的机会。继承他们的数据意识和容错能力；并提高生产率。但是，将大数据平台与HPC环境集成时，存在局限性，即可伸缩性差，内存开销大以及开发工作量大。本文关注于一个流行的大数据框架Apache Spark，并提出了一种架构，以支持将高度可扩展的基于MPI块的数据模型和通信模式与基于映射减少的编程模型进行集成。最终平台保留了Spark的数据抽象和编程接口，而无需在框架中进行任何更改，但允许用户将操作委派给MPI层。对我们的原型的评估表明，我们的方法有效地大规模集成了Spark和MPI，因此最终用户可以利用基于Spark的高级大数据工具和库的丰富生态系统所带来的生产力，而不会影响效率和可伸缩性。

著录项

来源
《IEEE/ACM International Conference on Big Data Computing Applications and Technologies》|2018年|1-10|共10页
会议地点 Zurich(CH)
作者
Silvina Caíno-Lores; Jesús Carretero; Bogdan Nicolae; Orcun Yildiz; Tom Peterka;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparks; Data models; Big Data; Programming; Tools; Adaptation models; Libraries;

机译：火花;数据模型；大数据;编程；工具；适应模型；图书馆;

相似文献

外文文献
中文文献
专利

1. USING THE LEVELS OF CONCEPTUAL INTEROPERABILITY MODEL AND MODEL-BASED DATA ENGINEERING TO DEVELOP A MODULAR INTEROPERABILITY FRAMEWORK [J] . Saikou Y. Diallo, Andreas Tolk, Jason Graff, Proceedings of the Workshop on Principles of Advanced and Distributed Simulation . 2011,第CDaROM期

机译：使用概念互操作性模型层次和基于模型的数据工程来开发模块化互操作性框架
2. A semantic web based framework for the interoperability and exploitation of clinical models and EHR data [J] . del Carmen Legaz-Garcia Maria, Martinez-Costa Catalina, Menarguez-Tortosa Marcos, Knowledge-Based Systems . 2016,第auga期

机译：基于语义网的临床模型和EHR数据的互操作性和开发框架
3. Triple V Product Development Framework and Its Interoperability between Product, Model and Data Lifecycles [J] . Qing Li, Hailong Wei, Chao Yu, IFAC PapersOnLine . 2020,第2期

机译：Triple V产品开发框架及其产品，模型和数据生命周期之间的互操作性
4. Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models [C] . Silvina Caíno-Lores, Jesús Carretero, Bogdan Nicolae, IEEE/ACM International Conference on Big Data Computing Applications and Technologies . 2018

机译：Spark-DIY：具有高性能块的数据模型的可互操作火花操作的框架
5. The modelling and analysis of the dynamic performance of perfusion chromatography in fixed-bed and periodic countercurrent column operations and parameter estimation and model discrimination using experimental data. [D] . Heeter, Glenn Allen. 1997

机译：在固定床和周期性逆流色谱柱操作中对灌注色谱的动态性能进行建模和分析，并使用实验数据进行参数估计和模型判别。
6. Accounting for multimorbidity in pay for performance: a modelling study using UK Quality and Outcomes Framework data [O] . Andrea Ruscitto, Stewart W Mercer, Daniel Morales, 2016

机译：考虑绩效报酬的多发病率：使用UK Quality and Outcomes Framework数据进行的建模研究
7. Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY [O] . Silvina Caino-Lores, Jesus Carretero, Bogdan Nicolae, 2019

机译：对高性能计算和大数据分析融合：Spark-DIY的情况

Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅