...
首页> 外文期刊>Distributed and Parallel Databases >Optimization of data flow execution in a parallel environment
【24h】

Optimization of data flow execution in a parallel environment

机译:优化并行环境中的数据流执行

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled in current cost models. The contribution of this work is twofold. Firstly, we propose an advanced cost model that aims to reflect the response time of a data flow that is executed in parallel more accurately. Secondly, we show that existing optimization solutions are inadequate and develop new optimization techniques targeting the proposed cost model. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified. Furthermore, we propose extensions to current optimizers that decide on the exact ordering of flow tasks taking into account the new optimization metric. Finally, we evaluate the new optimization algorithms and show up to 59% response time improvement over state-of-the-art task ordering techniques.
机译:尽管现代数据流是在并行和分布式环境中执行的,例如在多核机器或云上,当前的成本模型(例如,由最新数据流优化技术考虑的模型)无法准确反映这些执行环境中实际数据流执行的响应时间。这主要是由于以下事实:在当前成本模型中未充分建模并行性的影响,更具体地说,并发任务执行对运行时间的影响。这项工作的贡献是双重的。首先,我们提出了一种高级成本模型,该模型旨在反映更准确地并行执行的数据流的响应时间。其次,我们表明现有的优化解决方案是不够的,并针对提出的成本模型开发了新的优化技术。我们专注于现代商业智能工具(例如Pentaho Kettle)提供的单个多核机器环境,但是我们的方法可以扩展到大规模并行和分布式设置。我们提议的独特之处在于,我们以组合的方式对时间重叠和并发性对任务运行时间的影响进行建模。适当地量化了后者,并举例说明了其重要性。此外,我们建议对当前优化器进行扩展,以考虑新的优化指标来确定流任务的确切顺序。最后,我们评估了新的优化算法,并显示出与最新的任务排序技术相比,响应时间缩短了59%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号