Feature Selection with Maximum Information Metric in Text Categorization

机译：文本分类中具有最大信息量的特征选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text categorization usually suffers from a huge-scale number of features. Most of those are irrelevant and noise which could mislead the classifier. In order to improve the efficiency and effectiveness for text categorization, feature selection is often performed. In this paper, a novel feature selection approach for dealing with text categorization, called Maximum Information Metric (MIM), is proposed to get good quality terms of documents. This method exploits the weight of term and document frequency to construct the correlation between a term and each class. It aims to maximize the differences of term over each class based on information theory. We design a better evaluation function to yield a kind of ranking of the features. Experimental results on the standard Reuters-21578 and 20-Newsgroups corpus show that the new feature selection approach outperforms the classic methods including Information Gain (IG), Chi-square statistic (CHI) in a context of text categorization.

机译：文本分类通常会遭受大量功能的困扰。其中大多数是无关紧要的，可能会误导分类器。为了提高文本分类的效率和有效性，经常执行特征选择。本文提出了一种用于文本分类的新颖特征选择方法，称为最大信息量度（MIM），以获取高质量的文档术语。该方法利用术语的权重和文档频率来构造术语与每个类别之间的相关性。它旨在基于信息论最大化每个类别上的术语差异。我们设计了一个更好的评估函数，以对特征进行排序。在标准Reuters-21578和20-Newsgroups语料库上的实验结果表明，在文本分类的上下文中，新的特征选择方法优于经典方法，包括信息增益（IG），卡方统计（CHI）。

著录项

来源
《Information Science and Engineering (ICISE), 2009》|2009年|857-860|共4页
会议地点
作者
Haijuan Wang; Lixin Han; Xiaoqin Zeng; Zhilong Zhen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与传播理论;
关键词

相似文献

外文文献
中文文献
专利

1. Comparison of term frequency and document frequency based feature selection metrics in text categorization [J] . Nouman Azam, JingTao Yao Expert Systems with Application . 2012,第5期

机译：术语分类中基于术语频率和文档频率的特征选择指标的比较
2. FIVE NEW FEATURE SELECTION METRICS IN TEXT CATEGORIZATION [J] . FENGXI SONG, DAVID ZHANG, YONG XU, International Journal of Pattern Recognition and Artificial Intelligence . 2007,第6期

机译：文本分类中的五个新功能选择指标
3. Feature selection based on feature interactions with application to text categorization [J] . Tang Xiaochuan, Dai Yuanshun, Xiang Yanping Expert Systems with Application . 2019,第APRa期

机译：基于与应用到文本分类的特征交互的特征选择
4. Feature Selection with Maximum Information Metric in Text Categorization [C] . International Conference on Information Science and Engineering . 2009

机译：功能选择，具有文本分类的最大信息度量标准
5. Study of feature selection algorithms for text-categorization. [D] . Dave, Kandarp. 2011

机译：用于文本分类的特征选择算法的研究。
6. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization [O] . Jieming Yang, Zhaoyang Qu, Zhiying Liu -1

机译：文本分类中考虑不平衡问题的改进特征选择方法
7. An Evaluation of Existing and New Feature Selection Metrics in Text Categorization [O] . Şerafettin Taşcı, Tunga Güngör 2013

机译：文本分类中现有和新特征选择指标的评估

Feature Selection with Maximum Information Metric in Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅