采用一种属性约简算法,将待分类的数据样本进行两次约简处理--初次决策表属性约简和基于核属性值的二次约简.通过属性约简方法来删除数据集中的冗余数据,进而提高KNN算法的分类精度.在此基础上应用MapReduce并行编程模型,在Hadoop集群环境上实现并行化分类计算实验.实验结果表明,改进后的算法在集群环境下执行的效率得到很大提升,能够高效处理实验数据.实验执行的加速比也有明显提高.%An attribute reduction algorithm is proposed. The algorithm will be classified data samples for the two reduc-tion processing--attribute reduction of the initial decision table and second reduction based on kernel attribute value. The method of attribute reduction is to delete the redundant data, and then to improve the classification accuracy of KNN algorithm. On the basis of the application of the MapReduce parallel programming model, the parallel computing experiments are implemented in the Hadoop cluster environment. The experimental results show that the efficiency of the improved algorithm in the cluster environment has been greatly improved,which can effectively deal with the exper-imental data. Experimental implementation of the speedup is also significantly improved.
展开▼