Class-dependent feature selection algorithm for text categorization

机译：基于类别的文本分类特征选择算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A common approach in text categorization is to represent each word as a feature, however, many of these features are irrelevant. So, dimensionality reduction is an important step to diminish the computational effort and to improve accuracy. This paper presents a filter method for feature selection called Category-dependent Maximum f Features per Document (cMFDR). cMFDR is an extension that improves the idea of the MFDR algorithm. In MFDR, the best features are selected exploring documents that overcome a threshold that is calculated for the whole dataset under evaluation. We show that having only one global threshold is not an optimal strategy since it disregards categories that contain few relevant features, impairing the classification precision. So, cMFDR computes one threshold per category to assure that every category contributes with a different number of features. Moreover, the threshold calculation is not biased by documents with large number of features, unlike MFDR. The experimental evaluation showed the effectiveness of cMFDR on four text categorization benchmarks using three feature evaluation functions and Naïve Bayes Multinomial classifier. cMFDR obtains better or similar results than MFDR in 98% of the cases.

机译：文本分类中的一种常见方法是将每个单词表示为一个功能，但是，其中许多功能都是不相关的。因此，降维是减少计算量并提高准确性的重要步骤。本文提出了一种用于特征选择的过滤方法，称为与类别有关的每文档最大f个特征（cMFDR）。 cMFDR是对MFDR算法思想的扩展。在MFDR中，选择最佳功能以浏览文档，这些文档将克服针对评估中的整个数据集计算出的阈值。我们表明，仅拥有一个全局阈值并不是一种最佳策略，因为它忽略了包含很少相关特征的类别，从而损害了分类精度。因此，cMFDR为每个类别计算一个阈值，以确保每个类别贡献不同数量的功能。此外，与MFDR不同，阈值计算不受具有大量功能的文档的偏见。实验评估表明，使用三个功能评估函数和朴素贝叶斯多项式分类器，cMFDR在四个文本分类基准上的有效性。在98％的情况下，cMFDR的结果要比MFDR更好或更相似。

著录项

来源
《International Joint Conference on Neural Networks》|2016年|3508-3515|共8页
会议地点
作者
Rogério C. P. Fragoso; Roberto H. W. Pinheiro; George D. C. Cavalcanti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Feature extraction; Text categorization; Search problems; Measurement; Computational efficiency; Vocabulary;

机译：培训;特征提取;文本分类;搜索问题;测量;计算效率;词汇;

相似文献

外文文献
中文文献
专利

1. Combination of modified BPNN algorithms and an efficient feature selection method for text categorization [J] . Cheng Hua Li, Soon Cheol Park Information Processing & Management . 2009,第3期

机译：结合改进的BPNN算法和有效的特征选择方法进行文本分类
2. TABLE-BASED MATCHING APPROACH USING GENETIC ALGORITHM FOR FEATURE SELECTION IN TEXT CATEGORIZATION [J] . B. SUNIL SRINIVAS, A. GOVARDHAN Journal of Theoretical and Applied Information Technology . 2016,第2期

机译：基于遗传算法的文本分类中基于表的匹配方法
3. A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm [J] . Hao Chen, Wen Jiang, Canbing Li, Mathematical Problems in Engineering . 2013,第pta15期

机译：混沌优化和遗传算法的文本分类启发式特征选择方法
4. Class-dependent feature selection algorithm for text categorization [C] . Rogerio C. P. Fragoso, Roberto H. W. Pinheiro, George D. C. Cavalcanti International Joint Conference on Neural Networks . 2016

机译：文本分类的类依赖特征选择算法
5. Study of feature selection algorithms for text-categorization. [D] . Dave, Kandarp. 2011

机译：用于文本分类的特征选择算法的研究。
6. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization [O] . Jieming Yang, Zhaoyang Qu, Zhiying Liu -1

机译：文本分类中考虑不平衡问题的改进特征选择方法
7. Study of feature selection algorithms for text-categorization [O] . Dave Kandarp 2011

机译：文本分类的特征选择算法研究

Class-dependent feature selection algorithm for text categorization

摘要

著录项

相似文献

相关主题

期刊订阅