首页> 外文期刊>Distributed and Parallel Databases >Optimization of data flow execution in a parallel environment
【24h】

Optimization of data flow execution in a parallel environment

机译:并行环境中数据流执行优化

获取原文
获取原文并翻译 | 示例

摘要

Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled in current cost models. The contribution of this work is twofold. Firstly, we propose an advanced cost model that aims to reflect the response time of a data flow that is executed in parallel more accurately. Secondly, we show that existing optimization solutions are inadequate and develop new optimization techniques targeting the proposed cost model. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified. Furthermore, we propose extensions to current optimizers that decide on the exact ordering of flow tasks taking into account the new optimization metric. Finally, we evaluate the new optimization algorithms and show up to 59% response time improvement over state-of-the-art task ordering techniques.
机译:尽管现代数据流量是在并行和分布式环境中执行的,但是在多核机或云上,当前成本模型,例如,通过最先进的数据流优化技术考虑的那些,不要准确地反映这些执行环境中真实数据流执行的响应时间。这主要是由于,并行影响的影响,更具体地说,并行任务执行对运行时间的影响是在当前成本模型中充分建模的。这项工作的贡献是双重的。首先,我们提出了一种先进的成本模型,该模型旨在反映更准确地以并行执行的数据流的响应时间。其次,我们表明现有的优化解决方案不充分,开发了针对所提出的成本模型的新优化技术。我们专注于现代商业智能工具(如Pentaho水壶)提供的单核机器环境,但我们的方法可以扩展到大规模并行和分布式的设置。我们提案的独特特征是,我们以组合方式模拟一次重叠和并发性对任务运行时间的影响;后者被适当地量化,其意义举例说明。此外,我们提出了对当前优化器的扩展,以考虑到新的优化度量来决定流动任务的确切顺序。最后,我们评估了新的优化算法,并显示了最先进的任务订购技术的响应时间提高了59%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号