首页> 外文会议>International Conference on Multimedia and Network Information Systems >Comparison of the Novel Classification Methods on the Reuters-21578 Corpus
【24h】

Comparison of the Novel Classification Methods on the Reuters-21578 Corpus

机译:路透社 - 21578语料库中的新型分类方法的比较

获取原文

摘要

The paper describes an evaluation of novel boosting methods of the commonly used Multinomial Na?ve Bayes classifier. Evaluation is made upon the Reuters corpus, which consists of 10788 documents and 90 categories. All experiments use the tf-idf weighting model and the one versus the rest strategy. AdaBoost, XGBoost and Gradient Boost algorithms are tested. Additionally the impact of feature selection is tested. The evaluation is carried out with use of commonly used metrics - precision, recall, F1 and Precision-Recall breakeven points. The novel aspect of this work is that all considered boosted methods are compared to each other and several classical methods (Support Vector Machine methods and a Random Forests classifier). The results are much better than in the classic Joachims paper and slightly better than obtained with maximum discrimination method for feature selection. This is important because for the past 20 years most works were concerned with a change of results upon modification of parameters. Surprisingly, the result obtained with the use of feedforward neural network is comparable to the Bayesian optimization over boosted Na?ve Bayes (despite the medium size of the corpus). We plan to extend these results by using word embedding methods.
机译:本文描述了通常使用的多项式的Na的新颖方法升压的评价?已经贝叶斯分类器。评价是在路透社语料库,它由10788个文件和90类制成。所有实验使用TF-IDF加权模型和一个与其余的策略。 AdaBoost算法,XGBoost和梯度升压算法进行测试。此外特征选择的影响进行测试。准确率,召回,F1和精密召回盈亏平衡点 - 评价与使用的常用指标,它们的执行。这项工作的新颖方面是,所有考虑的升压方法相互比较和若干经典方法(支持向量机的方法和一个随机森林分类器)。结果是比经典Joachims纸好得多,略好于与特征选择最大的判别方法获得。这是重要的,因为在过去的20年作品大多用在参数修改结果的变化有关。令人惊奇地,与使用前馈神经网络的所获得的结果是相当的过升压的Na贝叶斯优化?已经贝叶斯(尽管语料库的介质尺寸)。我们计划用字嵌入方法来扩展这些结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号