首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >$run$ runData: Re-Distributing Data via Piggybacking for Geo-Distributed Data Analytics Over Edges
【24h】

$run$ runData: Re-Distributing Data via Piggybacking for Geo-Distributed Data Analytics Over Edges

机译: $运行$ <替代方案> R U N 数据:通过捎带重新分发数据,用于通过边缘进行地理分布数据分析

获取原文
获取原文并翻译 | 示例

摘要

Efficiently analyzing geo-distributed datasets is emerging as a major demand in a cloud-edge system. Since the datasets are often generated in closer proximity to end users, traditional works mainly focus on offloading proper tasks from those hotspot edges to the datacenter to decrease the overall completion time of submitted jobs in a one-shot manner. However, optimizing the completion time of current job alone is insufficient in a long-term scope since some datasets would be used multiple times. Instead, optimizing the data distribution is much more efficient and could directly benefit forthcoming jobs, although it may postpone the execution of current one. Unfortunately, due to the throwaway feature of data fetcher, existing data analytics systems fail to re-distribute corresponding data out of hotspot edges after the execution of data analytics. In order to minimize the overall completion time for a sequence of jobs as well as to guarantee the performance of current one, we propose to re-distribute the data along with task offloading, and formulate corresponding epsilon-bounded data-driven task scheduling problem over wide area network under the consideration of edge heterogeneity. We design an online schema runData, which offloads proper tasks and related data via piggybacking to the datacenter based on delicately calculated probabilities. Through rigorous theoretical analysis, runData is proved concentrated on its optimum with high probability. We implement runData based on Spark and HDFS. Both testbed results and trace-driven simulations show that runData re-distributes proper data via piggybacking and achieves up to 37 percent reduction on average response time compared with state-of-the-art schemas.
机译:有效地分析地理分布式数据集是在云边缘系统中的主要需求中出现的。由于数据集通常在靠近最终用户的接近时生成,因此传统的作品主要关注从那些热点边缘到数据中心卸载适当的任务,以减少以一拍方式减少提交作业的总体完成时间。但是,在长期范围内,单独优化当前作业的完成时间不足,因为某些数据集将多次使用。相反,优化数据分布更有效,并且可以直接受益于即将到来的作业,尽管它可能会推迟执行当前的作业。遗憾的是,由于数据获取器的一次性特征,现有数据分析系统未能在执行数据分析后重新分配热点边缘的相应数据。为了最小化一系列作业的整体完成时间以及保证当前的性能,我们建议将数据与任务卸载一起分发,并配制相应的epsilon限定数据驱动的任务调度问题思考边缘异质性的广域网。我们设计了一个在线模式Rundata,它通过基于精致计算的概率来捎带到数据中心来卸载正确的任务和相关数据。通过严格的理论分析,证明Rundata集中在最佳概率上。我们根据Spark和HDFS实现Rundata。测试平均结果和跟踪驱动模拟都表明,与最先进的模式相比,rundata通过搭扣重新分配适当的数据,并在平均响应时间上降低高达37%。

著录项

相似文献

  • 外文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号