Feature selection and extraction for text classification.

机译：用于文本分类的特征选择和提取。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the inherent properties of the features in the text classification domain is the fact that features are redundant. In this domain, words are used as features, and since words overlap in meaning, the resulting features display some degree of redundancy. By selecting a feature set for the classification task with a lower redundancy, the same classification performance can be obtained with fewer features.; In this thesis, a feature selector (called the MIFS-C) that is derived from the mutual information feature selection (MIFS) algorithm is introduced. This algorithm requires an expression for the information that added by inclusion of a feature. This thesis provides an improvement in its formulation, such that the classification results are improved. An optimization is also presented that achieves a significant training time speedup over the original algorithm. The MIFS algorithms require an appropriate value for a redundancy parameter, however none of the previous works suggest how to select a suitable value. An algorithm to estimate an optimal value for this parameter is presented in this thesis.; Also a number of feature extraction techniques that generate more complex features such as phrases and collocations are investigated. However, these features add more redundancy to the feature set, so that a feature selection that reduces the redundancy in the feature set is required. Moreover, the overall findings are that little is gained (even with a sophisticated feature selector such as MIFS-C) by including such features in the feature set. Therefore, better results can be achieved by focusing on better feature selection (for example by using the MIFS-C algorithm) in conjunction with word only features, than focusing on extracting complicated features.

机译：文本分类域中要素的固有特性之一是要素多余。在这个领域中，单词被用作特征，并且由于单词在含义上重叠，因此得到的特征表现出一定程度的冗余。通过为冗余度较低的分类任务选择特征集，可以以较少的特征获得相同的分类性能。本文提出了一种基于互信息特征选择算法的特征选择器（MIFS-C）。该算法需要一个表达式，用于通过添加功能来添加的信息。本论文在形式上提供了改进，从而改善了分类结果。还提出了一种优化方法，该方法相对于原始算法实现了明显的训练时间加速。 MIFS算法要求冗余参数具有适当的值，但是先前的工作均未提出如何选择适当的值。本文提出了一种估计该参数最优值的算法。还研究了许多生成更复杂特征（例如短语和搭配）的特征提取技术。但是，这些功能为功能集增加了更多的冗余，因此需要进行选择以减少功能集中的冗余。此外，通过将这样的功能包括在功能集中，总体发现是很少获得的（即使使用复杂的功能选择器，例如MIFS-C）。因此，与专注于提取复杂特征相比，专注于更好的特征选择（例如，通过使用MIFS-C算法）与仅单词的特征相结合，可以获得更好的结果。

著录项

作者
Bakus, Jan.;
展开▼
作者单位

University of Waterloo (Canada).;

展开▼
授予单位 University of Waterloo (Canada).;
学科 Engineering System Science.
学位 Ph.D.
年度 2005
页码 153 p.
总页数 153
原文格式 PDF
正文语种 eng
中图分类系统科学;
关键词

相似文献

外文文献
中文文献
专利

1. Bimodal spectroscopic evaluation of ultra violet-irradiated mouse skin inflammatory and precancerous stages:instrumentation, spectral feature extraction/selection and classification. (k-NN, LDA and SVM) [J] . G. Diaz-Ayil, M. Amouroux, W.C.P.M. Blondel, The European physical journal. Applied physics . 2009,第1期

机译：紫外线照射的小鼠皮肤炎症和癌前阶段的双峰光谱评估：仪器，光谱特征提取/选择和分类。（k-NN，LDA和SVM）
2. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering [J] . Kusum Kumari Bharti, Pramod Kumar Singh Expert Systems with Application . 2015,第6期

机译：通过将特征选择与特征提取方法集成来进行文本聚类的混合降维
3. Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection [J] . Masoumeh Zareapoor, Seeja K. R International Journal of Information Engineering and Electronic Business . 2015,第2期

机译：用于文本分类的特征提取或特征选择：以网络钓鱼电子邮件检测为例
4. A review on feature selection and feature extraction for text classification [C] . Foram P. Shah, Vibha Patel Proceedings of the 2016 IEEE International Conference on Wireless Communications, Signal Processing and Networking . 2016

机译：文本分类的特征选择和特征提取综述
5. Genetic algorithm optimized feature extraction and selection for ECG pattern classification. [D] . Huang, Zhijian. 2002

机译：遗传算法优化了心电图模式分类的特征提取和选择。
6. Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection [O] . Shengyu Liu, Buzhou Tang, Qingcai Chen, 2015

机译：生物医学文献中药物名称识别的特征工程：特征结合和特征选择
7. Lexicon based feature extraction for emotion text classification. [O] . Bandhakavi, Anil, Wiratunga, Nirmalie, Padmanabhan, Deepak, 2016

机译：基于词汇的特征提取，用于情感文本分类。
8. Performance Comparison of Feature Extraction Algorithms for Target Detection and Classification. [R] . A. Ray N. M. Nasrabadi S. Bahrampour S. Sarka T. Damarla 2013

机译：目标检测与分类特征提取算法的性能比较。

Feature selection and extraction for text classification.

摘要

著录项

相似文献

相关主题

期刊订阅