Big data technology is widely used for large volume data analysis. Wide acceptance of open source Hadoop platform encourages its use for real time analytics as well; this requires high performance from the system. Moreover, most of the High Performance Computing (HPC) applications may use data analytics as well to improve its execution time by reducing the number of simulation cycles. HDFS is the traditional file system used with Hadoop while Lustre is one of the file system popularly used in HPC systems. Does the same HPC setup be used for data analytics as well? - This paper addresses this question by comparing the performance of Hive SQL and Map-Reduce job executed on Lustre and HDFS file systems. The systems are evaluated for Financial, Telecom and Insurance applications on the Intel HPDA clusters. The results are presented in the paper which shows that application performance on Lustre is at least twice better than on HDFS. The paper also discuss the impact of horizontal and vertical scaling of cluster on performance of application deployed on Lustre and HDFS file systems.
展开▼