Chicken swarm foraging algorithm for big data classification using the deep belief network classifier

Sathyaraj R.; Ramanathan L.; Lavanya K.Balasubramanian VBanu Saira J.

摘要

Purpose The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry. Design/methodology/approach The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification. Findings The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency-inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively. Originality/value In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm.

机译：大数据的创新目的是增加每天在这样的传统软件工具在管理面临的几个问题大数据。不平衡数据在大规模数据集研究行业的主要约束。设计/方法/方法的目的论文引入大数据分类技术使用MapReduce框架基于一种优化算法。分类使用MapReduce启用框架,它利用建议优化算法,chicken-based命名细菌觅食算法(CBF)。通过整合生成算法细菌觅食优化算法(拍频振荡器)与猫群优化(方案)算法。该模型在两个执行过程阶段,即培训和测试阶段。训练阶段,产生的大数据从不同的分布受到来源使用映射器的并行处理映射器阶段,执行预处理和根据拟议中的CBF特征选择算法。冗余和不一致的数据,而功能部分在预处理步骤完成从数据中提取的重要的特性数据,提供改进的分类准确性。使用深减速器进行数据分类信念网络(DBN)分类器训练使用该CBF算法的数据分为不同的类最后,在培训过程的结束,个人还原剂存在训练模型。因此,增量数据处理有效的基于训练模型训练阶段。增量数据被分成不同的子集,送入不同映射器的分类。包含一个训练得到的模型训练阶段。增量分类数据。分类,从每个获得的输出映射器是融合和美联储的减速器分类。Jaccard系数得到使用癫痫发作识别数据库。提出CBF-DBN产生最大精度值为91.129%,而准确的值现有的神经网络(NN), DBN,天真贝叶斯classifier-term frequency-inverse文档频率(NBC-TFIDF)是82.894%、86.184%和86.512%,分别。拟议中的CBF-DBN产生最大Jaccard系数值为88.928%,而Jaccard现有的神经网络系数值,DBN,NBC-TFIDF是75.891%、79.850%和81.103%,分别。大数据分类方法提出了分类为满足大规模数据集庞大的数据的约束。分类执行MapReduce框架基于训练和测试阶段这样的数据并行处理在同一时间。数据获得和划分不同子集的数据并送入映射器。映射器,特征提取步骤执行中提取的重要的特性。获得的特性受到还原剂使用获得的分类数据特性。分类其中DBN训练使用拟议中的CBF算法。

Chicken swarm foraging algorithm for big data classification using the deep belief network classifier

摘要

著录项

相似文献

相关主题

期刊订阅