首页> 外文会议>International Conference on Intelligent Human-Machine Systems and Cybernetics >Adaptive Naive Bayesian Classifier for Automatic Classification of Webpage from Massive Network Data
【24h】

Adaptive Naive Bayesian Classifier for Automatic Classification of Webpage from Massive Network Data

机译:基于海量网络数据的网页自动分类的自适应朴素贝叶斯分类器

获取原文

摘要

This paper presents the application of Na??ve Bayesian classifier to automatic classification of webpage. The key point in this article is that massive empirical data derives from the real traffic data collected from the backbone network of certain province in China, and we apply cumulative probability to determine the optimal size of feature vector adaptively. It's proved that the adaptive method of cumulative probability threshold selection applied in this study has good robustness. This paper focus on four feature selection methods: TF-IDF (term frequency-inverse document frequency), IG (Information Gain), MOR (Multi-class Odds Ratio), CDM (Class Discriminating Measure). We find that Na??ve Bayesian classifier performs fairly well in speed and precision on big data sets, whose precision, recall and F1 metric are all above 90% in all 6 categories of webpage.
机译:本文提出了朴素贝叶斯分类器在网页自动分类中的应用。本文的重点是海量的经验数据来源于从中国某省的骨干网收集的真实交通数据,并且我们运用累积概率来自适应地确定特征向量的最佳大小。实践证明,本文所采用的自适应累积概率阈值选择方法具有很好的鲁棒性。本文重点介绍四种特征选择方法:TF-IDF(术语频率与文档频率的倒数),IG(信息增益),MOR(多类赔率),CDM(类区分度)。我们发现,朴素贝叶斯分类器在大数据集的速度和精度方面表现相当不错,其精度,召回率和F1指标在所有6个类别的网页中均超过90%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号