首页> 外文期刊>Journal of supercomputing >Use case-based evaluation of workflow optimization strategy in real-time computation system
【24h】

Use case-based evaluation of workflow optimization strategy in real-time computation system

机译:实时计算系统中基于用例的工作流优化策略评估

获取原文
获取原文并翻译 | 示例
       

摘要

With the start of big data era, data stream computing has emerged as a well-known approach to optimize data-intensive workflows. Apache STORM is an open-source real-time distributed computation system for processing data streams and has been opted by famous organizations such as Twitter, Yahoo, Alibaba, Baidu, Groupon. The workflows are implemented as topologies in STORM. The main aspect that controls the execution performance of a workflow in STORM is the strategy of scheduling the topology components (spout and bolts). In this paper, we evaluate and analyze the performance of our algorithm Partition-based Data-intensive Workflow optimization Algorithm (PDWA) in Apache STORM using a use case workflow, EURExpressII. It is a real-world application-based workflow that builds a transcriptome-wide atlas of gene expression for the developing mouse embryo established by ribonucleic acid (RNA) in situ hybridization. Our proposed algorithm, PDWA, partitions the application task graph so that the data movement between partitions is minimum. Each partition is then mapped on one machine for the execution of tasks of that partition. It provides minimum execution time for that particular partition. Partial task duplication is also part of this algorithm that enhances the performance. A STORM-based computing cluster is developed in OpenStack cloud which is used as a computing environment. The performance of PDWA-based optimizer is evaluated with the data sets of different sizes. The achieved results show that PDWA performs with 21% improved average execution time for different sizes of data sets and varying execution nodes. In addition, the comparative results show that on average the efficiency of PDWA is 20.4% higher as compared to STORM default scheduler (SDS).
机译:随着大数据时代的开始,数据流计算已经成为优化数据密集型工作流程的众所周知的方法。 Apache STORM是用于处理数据流的开源实时分布式计算系统,并已被Twitter,Yahoo,阿里巴巴,百度,Groupon等著名组织所采用。工作流在STORM中作为拓扑实现。在STORM中控制工作流执行性能的主要方面是调度拓扑组件(喷嘴和螺栓)的策略。在本文中,我们使用用例工作流程EURExpressII对Apache STORM中基于分区的数据密集型工作流优化算法(PDWA)的算法进行评估和分析。这是一个基于实际应用的工作流程,可为通过核糖核酸(RNA)原位杂交建立的正在发育的小鼠胚胎建立转录组范围的基因表达图谱。我们提出的算法PDWA对应用程序任务图进行分区,以使分区之间的数据移动最小。然后将每个分区映射到一台计算机上,以执行该分区的任务。它为该特定分区提供了最少的执行时间。部分任务重复也是该算法的一部分,可以提高性能。在OpenStack云中开发了一个基于STORM的计算集群,该集群用作计算环境。使用不同大小的数据集评估基于PDWA的优化器的性能。获得的结果表明,对于不同大小的数据集和不同的执行节点,PDWA的平均执行时间缩短了21%。此外,比较结果表明,与STORM默认调度程序(SDS)相比,PDWA的平均效率高20.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号