【24h】

Assessing the Impact of Concurrent Replication with Canceling in Parallel Jobs

机译:评估并行作业中取消并发复制的影响

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.
机译:并行作业处理已成为许多软件应用程序(例如科学计算)中的关键功能。并行化允许这些应用程序利用大型资源池,例如云或网格数据中心。但是,由大量并行任务组成的作业如果任何一项任务失败,将遭受失败,这需要重新处理和额外的延迟。在本文中,我们探讨了并行作业的复制对作业可靠性和响应时间以及资源利用率的影响。复制机制包括在作业或任务级别上同时处理副本,检索首先完成的副本结果(如果有)以及取消任何剩余的正在处理的副本。我们提出了一种随机模型,该模型明确考虑并行作业处理,在作业和任务级别上的复制,并处理一般的到达过程。我们开发了一种数值有效的算法来求解模型的大型实例并计算关键性能指标。我们观察到,任务取消机制提供了一种有效的方法来限制资源利用率的提高,从而允许使用副本,不仅提高了工作可靠性,而且具有减少响应时间的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号