Clusters of commodity microprocessors have overtaken custom-designed systems as the high performance computing (HPC) platform of choice. The design and optimization of workload scheduling systems for clusters has been an active research area. This paper surveys some examples of workload scheduling methods used in large-scale applications such as Google, Yahoo, and Amazon that use a MapReduce parallel processing framework. It examines a specific MapReduce framework, Hadoop, in some detail. It describes a novel dynamic prioritization, self-tuning workload scheduler, and provides simulation results that suggest the approach will improve performance compared to standard Hadoop scheduling.
展开▼