首页> 外文会议>IEEE International Congress on Big Data >Composable and efficient functional big data processing framework
【24h】

Composable and efficient functional big data processing framework

机译:可组合且高效的功能大数据处理框架

获取原文

摘要

Over the past years, frameworks such as MapRe-duce and Spark have been introduced to ease the task of developing big data programs and applications. However, the jobs in these frameworks are roughly defined and packaged as executable jars without any functionality being exposed or described. This means that deployed jobs are not natively composable and reusable for subsequent development. Besides, it also hampers the ability for applying optimizations on the data flow of job sequences and pipelines. In this paper, we present the Hierarchically Distributed Data Matrix (HDM) which is a functional, strongly-typed data representation for writing composable big data applications. Along with HDM, a runtime framework is provided to support the execution of HDM applications on distributed infrastructures. Based on the functional data dependency graph of HDM, multiple optimizations are applied to improve the performance of executing HDM jobs. The experimental results show that our optimizations can achieve improvements of between 10% to 60% of the Job-Completion-Time for different types of operation sequences when compared with the current state of art, Apache Spark.
机译:在过去的几年中,引入了诸如MapRe-duce和Spark之类的框架来简化开发大数据程序和应用程序的任务。但是,这些框架中的作业被粗略地定义并打包为可执行jar,而没有公开或描述任何功能。这意味着已部署的作业本机不可组合并且不可用于后续开发。此外,它还会妨碍对作业序列和流水线的数据流进行优化的能力。在本文中,我们介绍了分层分布式数据矩阵(HDM),它是一种功能强大的强数据表示形式,用于编写可组合的大数据应用程序。与HDM一起,提供了运行时框架以支持在分布式基础结构上执行HDM应用程序。基于HDM的功能数据依赖关系图,可以应用多项优化来提高执行HDM作业的性能。实验结果表明,与当前最先进的Apache Spark相比,针对不同类型的操作序列,我们的优化可以将作业完成时间提高10%至60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号