随着互联网的普及,PB级海量数据的存储、处理需求越来越大,传统数据库和存储架构已不能满足如此大数据量下的快速响应需求。作为一个开源的分布式系统基础架构,Hadoop提供了高可靠性的分布式存储架构和高速的海量数据计算方式,被视为解决海量数据处理瓶颈的有效途径。文章通过搭建Hadoop集群平台对1PB海量数据进行存储、处理,大大提高了系统处理性能。%With the popularization of internet, the needs of petabyte-scale data storage and processing are bigger and bigger, traditional database and storage structure couldn’t meet the quick response based on so large amount of data. As a open source distributed system structure, Hadoop gives high-reliability distributed storage structure and high-speed mass data computing methods, is considered a effective way to resolve the bottleneck of mass data processing. In this paper, we build a hadoop platform to store and process a petabyte data, and the system performance is improved greatly.
展开▼