With the big data burst, how to improve the execution efficiency of the algorithm is the re-search focus of big data classification, Spark is the distributed parallel computing framework, support the iterative data flow. In this paper, the naive Bayes text classification algorithm is used in parallel flow pro-cessing. Experiments show that the parallel flow type Bayes classification algorithm can effectively to im-prove the efficiency of data classification.%随着大数据的爆发,如何提高算法的执行效率是大数据分类的研究热点,Spark是分布式并行计算框架,支持迭代数据流,该文对朴素贝叶斯文本分类算法作并行流式化处理,实验证明,并行流式化Bayes分类算法能有效提高大数据分类效率.
展开▼