MapReduce is a popular batch data processing framework in cloud computing field. Sharing MapReduce cluster and meeting the deadlines of jobs is a key problem to be solved. This paper proposes a two phase real-time scheduling algorithm which separate scheduling into job scheduling and task scheduling.It uses sampling method to estimate the task excuting time so that the scheduler can make a decision on how many slots should be assigned to the job and how to calculate the job's priority. Using delay-scheduling scheme in task scheduling, the"computing locality"problem can be solved well. Experiments result shows that the scheduling algorithm implemented in this paper satisfies the job's real-time requirement as well as throughput of the cluster.% MapReduce是云计算中重要的批数据处理框架,多任务共享MapReduce机群并满足任务实时性要求是调度算法急需解决的问题。提出两阶段实时调度算法,将调度划分为任务间调度和任务内调度。对于任务间调度,使用抽样法和经验值法确定子任务执行时间,利用该参数建立资源分配模型,动态确定任务优先级进行调度;对于子任务使用延迟调度策略进行调度,保证计算的本地性。实验结果显示,两阶段实时调度算法相比公平调度算法和FIFO算法,在保证吞吐量的同时能够满足任务实时性要求。
展开▼