Hadoop is an open-source software platform for distributed computing dealing with a parallel processing of large data sets. It has been widely used in the field of cloud computing. This paper describes the three most crucial parts of Hadoop, including HDFS, the distributed file system, MapReduce, the data processing model, and HBase, the distributed structured data table. The application status, main research directions and existing problems of Hadoop data processing platform are analyzed, and some performance optimization suggestions are given.
展开▼