Performance Comparison of Naïve Bayes and Complement Naïve Bayes Algorithms

机译：朴素贝叶斯算法和互补朴素贝叶斯算法的性能比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big data is defined with 3 V which are volume, velocity and variety. It is hard to analyze, store and process this data because of its size and complexity. When traditional tools are used to analyze the data, execution time is too much. On the other hand, there are some tools and libraries to analyze and process the big data. As a result, it does not take too much time to analyze and process the data. For example; Hadoop is an open source library that allow the distributed computing for large datasets. Mahout is a library that allows machine learning, Hive allows querying and Kafka allows messaging. In this paper, Hadoop and Mahout are used and performance of Naïve Bayes and Complement Naïve Bayes Algorithms are compared based on average correctly classified instance percentage, average training time and average testing time with different size of the dataset. As a dataset, "20 Newsgroups" is used and size of the dataset is increased by scaling the dataset with 2, 4 and 8. As a result, datasets with the size of 37692, 75384 and 150768 are created. All experiments are carried on with all the datasets using different smoothing, weight and normalization parameters for 10 times and then, average of all the results are taken. After all the experiments, it is observed that performance of Naïve Bayes Algorithm is better than Complement Naïve Bayes Algorithm based on average training time. On the other hand, performance of Complement Naïve Bayes is better than the other based on average correctly classified instance percentage.

机译：大数据定义为3 V，即体积，速度和种类。由于数据的大小和复杂性，很难对其进行分析，存储和处理。当使用传统工具分析数据时，执行时间过多。另一方面，有一些工具和库可用于分析和处理大数据。结果，不需要花费太多时间来分析和处理数据。例如; Hadoop是一个开放源代码库，允许对大型数据集进行分布式计算。 Mahout是一个允许机器学习的库，Hive允许查询，而Kafka则允许消息传递。在本文中，使用了Hadoop和Mahout，并根据正确分类的实例平均百分比，平均训练时间和不同数据集大小的平均测试时间比较了朴素贝叶斯算法和互补朴素贝叶斯算法的性能。作为数据集，使用“ 20个新闻组”，并通过将数据集缩放为2、4和8来增加数据集的大小。结果，创建了大小为37692、75384和150768的数据集。使用不同的平滑，权重和归一化参数对所有数据集进行所有实验10次，然后取所有结果的平均值。经过所有实验，基于平均训练时间，发现朴素贝叶斯算法的性能优于互补朴素贝叶斯算法。另一方面，基于平均正确分类实例百分比，ComplementNaïveBayes的性能要优于其他方法。

著录项

来源
《International Conference on Electrical and Electronics Engineering》|2019年|131-138|共8页
会议地点
作者
Berna Seref; Erkan Bostanci;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Testing; Classification algorithms; Smoothing methods; Libraries; Big Data; Machine learning;

机译：培训;测试;分类算法;平滑方法;图书馆;大数据;机器学习;

相似文献

外文文献
中文文献
专利

1. Direct comparison between support vector machine and multinomial naive Bayes algorithms for medical abstract classification. [J] . Stan Matwin, Vera Sazonova Journal of the American Medical Informatics Association : . 2012,第5期

机译：支持向量机与多项朴素贝叶斯算法之间的直接比较，用于医学摘要分类。
2. Lung Cancer Survivability Prediction based on Performance Using Classification Techniques of Support Vector Machines, C4.5 and Naive Bayes Algorithms for Healthcare Analytics [J] . Pradeep K R, Naveen N C Procedia Computer Science . 2018,第1期

机译：支持向量机分类技术，C4.5和朴素贝叶斯算法基于性能的肺癌生存力预测，用于医疗保健分析
3. A naive Bayes algorithm for tissue origin diagnosis (TOD‐Bayes) of synchronous multifocal tumors in the hepatobiliary and pancreatic system [J] . Jiang Weiqin, Shen Yifei, Ding Yongfeng, International Journal of Cancer =: Journal International du Cancer . 2018,第2期

机译：肝胆碱和胰系统同步多焦瘤的组织渊源诊断（TOD-Bayes）的幼稚贝叶斯算法
4. Performance Comparison of Na?ve Bayes and Complement Na?ve Bayes Algorithms [C] . Berna Seref, Erkan Bostanci International Conference on Electrical and Electronics Engineering . 2019

机译：Na ve Bayes和Compresse Na've Bayes算法的性能比较
5. Superiority of Bayes Estimators over the Mle in High Dimensional Models on Compact Riemannian Manifolds and Its Implication for Nonparametric Bayes Theory [D] . Oliver, Rachel. 2020

机译：紧凑型黎曼歧管的高维模型中贝叶斯估计的优越性及其对非参数贝叶斯理论的影响
6. Shallow Landslide Susceptibility Mapping: A Comparison between Logistic Model Tree Logistic Regression Naïve Bayes Tree Artificial Neural Network and Support Vector Machine Algorithms [O] . Viet-Ha Nhu, Ataollah Shirzadi, Himan Shahabi, 2020

机译：浅层滑坡敏感性图：逻辑模型树逻辑回归朴素贝叶斯树人工神经网络和支持向量机算法之间的比较
7. Shallow Landslide Susceptibility Mapping: A Comparison between Logistic Model Tree, Logistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine Algorithms [O] . Viet-Ha Nhu, Ataollah Shirzadi, Himan Shahabi, 2020

机译：浅层滑坡易感测绘：物流模型树，逻辑回归，天真贝叶斯树，人工神经网络和支持向量机算法的比较

Performance Comparison of Naïve Bayes and Complement Naïve Bayes Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅