Hadoop framework has recently been adapted for use by the video analytics community for intensive and distributed video processing and storage. However, the challenge is to estimate the required amount of resources to be used in such an environment to fulfil the requirements of a user with requirements constraints. Therefore, it is important to understand how to model the performance of a Hadoop based implementation of video analytic applications in terms of meeting their performance goals. In this paper we propose the use of machine learning approachs in modelling the execution time based on the given resources. The prediction is based on parameters related to typical video analytic applications such as video file characteristics (e.g. resolution, file size, frame rate etc.), cluster resource consumption, and Hadoop configuration values (reducer slots and tasks). The investigation carried out compares the use of different machine learning classifiers with regard to their best obtainable performance accuracies and show that a decision based model (M5P) outperforms a Linear Regression model, while the Ensemble Classifier, Bagging, out-performs these standard single classifiers. The research conducted bridges an existing research gap in video analytic-related performance predictions, whereby current research focuses on different application types and is largely limited to using standard learning algorithms such as SVM, Linear Regression and Multilayer Perceptron (MLP).
展开▼