An empirical evaluation of text classification and feature selection methods

Muazzam Ahmed Siddiqui

首页> 外文期刊>Artificial Intelligence Research >An empirical evaluation of text classification and feature selection methods

【24h】

An empirical evaluation of text classification and feature selection methods

机译：文本分类和特征选择方法的实证评估

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An extensive empirical evaluation of classifiers and feature selection methods for text categorization is presented. More than 500 models were trained and tested using different combinations of corpora, term weighting schemes, number of features, feature selection methods and classifiers. The performance measures used were micro-averaged F measure and classifier training time. The experiments used five benchmark corpora, three term weighting schemes, three feature selection methods and four classifiers. Results indicated only slight performance improvement with all the features over only 20% features selected using Information Gain and Chi Square. More importantly, this performance improvement was not deemed statistically significant. Support Vector Machine with linear kernel reigned supreme for text categorization tasks producing highest F measures and low training times even in the presence of high class skew. We found statistically significant difference between the performance of Support Vector Machine and other classifiers on text categorization problems.

机译：提出了广泛的实证评估文本分类器和特征选择方法。使用语料库，术语权重方案，特征数量，特征选择方法和分类器的不同组合对500多个模型进行了训练和测试。所使用的性能指标是微平均F指标和分类器训练时间。实验使用了五种基准语料库，三种术语加权方案，三种特征选择方法和四种分类器。结果表明，使用“信息增益”和“卡方”选择的所有功能仅超过20％时，所有功能的性能仅略有改善。更重要的是，这种性能改善在统计上并不重要。具有线性核的Support Vector Machine在文本分类任务中占据上风，即使在出现高级偏斜的情况下，也能产生最高的F度量和较短的训练时间。我们发现在文本分类问题上，支持向量机和其他分类器的性能在统计上有显着差异。

著录项

来源
《Artificial Intelligence Research》 |2016年第2期|共12页
作者
Muazzam Ahmed Siddiqui;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
Classifier comparison; Feature selection methods; Empirical evaluation; Text categorization; Class imbalance;

机译：分类器比较;特征选择方法;实证评估;文本分类;类不平衡;

相似文献

外文文献
中文文献
专利

1. An empirical evaluation of text classification and feature selection methods [J] . Muazzam Ahmed Siddiqui Artificial Intelligence Research . 2016,第2期

机译：文本分类和特征选择方法的实证评估
2. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods [J] . Kou Gang, Yang Pei, Peng Yi, Applied Soft Computing . 2020,第期

机译：使用多种标准决策方法对小型数据集的文本分类特征选择方法的评估
3. An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features [J] . Wan Cen, Freitas Alex A. Artificial Intelligence Review: An International Science and Engineering Journal . 2018,第2期

机译：基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于词组的分层特征选择方法的实证评估
4. Experimental evaluation of feature selection methods for text classification [C] . Uchyigit Gulden Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on . 2012

机译：文本分类特征选择方法的实验评估
5. Advanced features and feature selection methods for vibration and audio signal classification. [D] . Tsau, Enshuo. 2012

机译：用于振动和音频信号分类的高级功能和功能选择方法。
6. Evaluating statistical learning methods for cell type classification and feature selection using RNA-seq data [O] . Hao Chen 2014

机译：使用RNA序列数据评估细胞类型分类和特征选择的统计学习方法
7. An empirical evaluation of text classification and feature selection methods [O] . Muazzam Ahmed Siddiqui 2016

机译：文本分类和特征选择方法的实证评估

An empirical evaluation of text classification and feature selection methods

摘要

著录项

相似文献

相关主题

期刊订阅