JointCloud provides a large-scale, flexible, and elastic computing resource platform. Big data systems such as MapReduce and Spark are widely deployed on this platform for big data processing. How to choose a cloud platform in accordance with the need of customers is a problem. Current performance benchmarking suites can choose suitable cloud platforms for customers. However, they do not consider the reliability of applications running atop big data systems. These systems have high scalability, but the applications running atop them often generate runtime errors, such as out of memory errors, I/O exceptions, and task timeouts. For users, they want to know whether the developed applications have potential application faults. For system designers and managers, they want to know whether the deployed/updated systems have potential system faults. In addition, current benchmarks for big data system are also only designed for performance testing. To fill this gap, we propose a reliability benchmark, which contains representative applications, an abnormal data generator, and a configuration combination generator. Different from performance benchmarks, this benchmark (1) generates abnormal test data according to the application characteristics, and (2) reduces the configuration combination space based on configuration features. Currently, we implemented this benchmark on Spark system. In our preliminary test, we found three types of errors (i.e., out of memory errors, timeout and wrong results) in five SQL, Machine Learning, and Graph applications.
展开▼