In cloud storage centers, replica of file may be lost because of the failure of nodes, which will affect the reliability of system, as well as the efficiency of file concurrent access. There are some deficiencies in the default replica copy algorithm in Hadoop, such as a concentration of data transfer process on a few DataNodes, load imbalance, low disk I/O throughput. To address this issue, this paper proposes a rapid replica copy algorithm based on popularity in Hadoop. It handles the popular block firstly, and chooses source and destination DataNodes properly. The simulation results show that the proposed algorithm improves the disk I/O throughput, load balance, and reduces average service response time significantly.%在云存储中心,由于节点失效带来的文件数据块副本丢失不仅会影响系统的可靠性,还会影响文件的并发访问效率。针对 Hadoop 中默认的副本复制方法存在的问题,即副本复制过程某些节点数据传输过于集中,负载不均衡,磁盘 I/O 吞吐率低,提出一种基于热度的快速副本复制算法。该算法优先复制热度高的数据块,合理选择数据块复制的源节点和目的节点。仿真结果表明,该算法平衡了系统的工作负载,提高了磁盘 I/O 吞吐率,显著降低用户请求平均响应时间。
展开▼