Map-Reduce is a popular framework for very large-scale data mining and processing. Recently, some works have attempted to model the behavior of Map-Reduce, but these existing models ignore the non-linearity of disk I/O performance under contention, which is a critical aspect of estimating the performance of data intensive applications. Understanding I/O interference between tasks running on the same node is critical in optimizing task scheduling for improved resource utilization. In this paper, we present a model to estimate the I/O behavior of Map-Reduce applications that can be used to achieve these goals.
展开▼