...
首页> 外文期刊>Future generation computer systems >Applying big data paradigms to a large scale scientific workflow: Lessons learned and future directions
【24h】

Applying big data paradigms to a large scale scientific workflow: Lessons learned and future directions

机译:将大数据范例应用于大规模的科学工作流程:学习和未来方向的经验教训

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKF-HGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2). The results we obtained were analysed to synthesize the lessons we learned from this experience, while discussing promising directions for further research.
机译:与执行科学工作流程相关的数据越来越多的数据提高了他们对平行数据密集型问题的转变的认识。在本文中,我们在科学集合工作流程的背景下,提供了与大数据分析范式的传统高性能计算和基于GRID的方法的经验。我们的目标是评估和讨论这种以生产现成的工作流程的适用机制,特别是在可扩展性方面。我们专注于大数据生态系统中的两个关键元素:以数据为中心的编程模型,以及集成每个节点中存储和计算的底层基础架构。我们尝试了从水文域,ENKF-HGS的基于代表性的MPI的迭代工作流程,我们使用Spark数据分析框架重新实现。我们在局部集群中进行了实验,私人云运行unconnula,以及亚马逊弹性计算云(Amazonec2)。我们获得的结果分析以综合我们从这种经验中吸取的教训,同时讨论了有希望的进一步研究方向。

著录项

  • 来源
    《Future generation computer systems》 |2020年第9期|440-452|共13页
  • 作者单位

    Department of Computer Science University Carlos Ⅲ of Madrid Avda. Universidad 30 Leganes 28911 Madrid Spain;

    Computer Science department (IIUN) University of Neuchatel Rue Emile-Argand 11 CP 158 Neuchatel 2000 Switzerland;

    Department of Computer Science University Carlos Ⅲ of Madrid Avda. Universidad 30 Leganes 28911 Madrid Spain;

    Computer Science department (IIUN) University of Neuchatel Rue Emile-Argand 11 CP 158 Neuchatel 2000 Switzerland;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Scientific workflows; Big data; Cloud computing; Apache spark; Hydrology;

    机译:科学工作流;大数据;云计算;Apache Spark;水文;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号