遗传算法求解多峰函数极值需进行反复多次的迭代运算,面对大数据样本时会出现运算效率过低的现象,这极大地限制了遗传算法的实际应用.经典Hadoop并行平台可在一定程度上提高遗传算法的运行效率,而新一代Spark并行平台可以更加充分地发挥遗传算法的并行潜能.设计并实现了基于Spark的并行遗传算法,在各个子节点上并行执行子种群个体的交叉、变异等操作,达到了高度并行化进化种群以高效求取多峰函数极值的目的.为方便比较,同时设计并实现了单机及Hadoop平台下的相应算法.实验结果表明,处理大数据样本时,相比传统单机和Hadoop平台,基于Spark的并行化遗传算法显著降低了求解多峰函数极值的耗时,大幅提高了算法的效率;同时,由于其并行计算带来的强大随机性,也有效避免了种群单一过早收敛的问题,提高了算法的准确性.%The Genetic Algorithm (GA) needs many computation iterations in solving multimodal function extremums,so its running efficiency is too low when dealing with large-scale data,which greatly limits its practical application.The classical parallel platform Hadoop can improve the GA running efficiency to some extent,while the state-of-the-art parallel platform Spark can release much more parallelism of GA by realizing parallel crossover,mutation and other operations on each computing node.For the convenience of comparison,the GA solving multimodal function extremums are designed and implemented on single node,Hadoop and Spark,respectively.Experimental results show that,compared with single node platform and Hadoop platform,the Spark based implementation not only significantly reduces the running time but also effectively avoids the problem of premature convergence because of its powerful randomness,while dealing with large-scale samples.
展开▼