首页> 外文会议>International Conference on Informatics and Computing >Comparison of Multinomial Naïve Bayes with K-Nearest Neighbors, Support Vector Machine and Random Forest for Classification of “Network Attacks” Document
【24h】

Comparison of Multinomial Naïve Bayes with K-Nearest Neighbors, Support Vector Machine and Random Forest for Classification of “Network Attacks” Document

机译:多项式朴素贝叶斯与K最近邻,支持向量机和随机森林的比较,用于“网络攻击”文档的分类

获取原文

摘要

The objective of this paper is to categorize English documents with the topic “Network Attack” using Multinomial Naïve Bayes method and. It then compares with K-Nearest Neighbors (KNN), Support Vector Machine Linear (SVM Linear) and Random Forest. The classification process was conducted using some feature extraction methods, such as Term Frequency-Inverse Document Frequency (TF-IDF) extraction, Count Vector, and Document Vector (Doc2vec). The experimental result showed that MNB with TF-IDF got an accuracy of 76.00%. The TF-IDF with KNN method, SVM Linear, Random Forest results from efficiency 72.66%, 78.66% and 81.66% respectively, and using Count Vector were 60.00%, 77.00%, 70.66% and 81.00% (MNB, KNN, SVM Linear, Random Forest). The experimental was also conducted using the Random Forest method (as the classifier) and Document Vector (as the feature extraction method). Thus it is obtained the accuracy of 63.33%. The MNB method was quite better to classify the document than KNN method. However, SVM and Random Forest methods were better than the MNB and KNN methods. It can be concluded that the use of TF-IDF was generally better than using Count Vector and Doc2vec. However, the Count Vector had better result compared to TF-IDF under MNB Classifies.
机译:本文的目的是使用多项朴素贝叶斯方法将英语文档分类为“网络攻击”主题。然后将其与K最近邻(KNN),支持向量机线性(SVM Linear)和随机森林进行比较。分类过程是使用某些特征提取方法进行的,例如术语频率-逆文档频率(TF-IDF)提取,计数向量和文档向量(Doc2vec)。实验结果表明,带有TF-IDF的MNB的准确率为76.00%。采用KNN方法,SVM线性,随机森林的TF-IDF效率分别为72.66%,78.66%和81.66%,使用Count Vector分别为60.00%,77.00%,70.66%和81.00%(MNB,KNN,SVM线性,随机森林)。还使用随机森林方法(作为分类器)和文档向量(作为特征提取方法)进行了实验。因此,获得了63.33%的精度。 MNB方法比KNN方法更好地对文档进行分类。但是,支持向量机和随机森林方法比MNB和KNN方法更好。可以得出结论,使用TF-IDF通常比使用Count Vector和Doc2vec更好。但是,与MNB分类下的TF-IDF相比,Count Vector具有更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号