MapReduce framework has become the state-of-the-art paradigm for large-scale data processing. In our ongoing work, we attempt to solve the three interrelated problems: how to build an accurate MapReduce performance model, how to use it to automatically detect and optimize slow-running MapReduce jobs, and how to use it to help scheduler arrange job execution sequence. Currently, we mainly study the job execution time model and its training method. We also present several policies to optimize the job configuration and scheduler.
展开▼