Aiming at the character of Nave Bayesian classification algorithm,we propose an improved feature selection method.Modern large-scale data classification consumes too long time on a single computer for training and testing,in view of this,we design and implement on hadoop distributed platform the Nave Bayesian-based data classification algorithm.Experimental results show that the improved algorithm can effectively improve the classification accuracy;the designed parallel Nave Bayes data classification algorithm has higher execution efficiency,and is suitable for mass data processing and analysis.%针对朴素贝叶斯分类算法的特点,提出一种改进的特征选择方法。现代大规模数据分类在单机计算机上训练和测试时间过长,对此,在 hadoop 分布式平台下设计并实现了基于朴素贝叶斯的数据分类算法。实验结果表明,改进的算法能有效提高分类的正确率,所设计的并行朴素贝叶斯数据分类算法具有较高的执行效率,适用于海量数据的处理与分析。
展开▼