Using micro-documents for feature selection: The case of ordinal text classification

Stefano Baccianella; Andrea Esuli; Fabrizio Sebastiani

首页> 外文期刊>Expert Systems with Application >Using micro-documents for feature selection: The case of ordinal text classification

【24h】

Using micro-documents for feature selection: The case of ordinal text classification

机译：使用微型文档进行特征选择：序数文本分类的情况

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most popular feature selection methods for text classification such as information gain (also known as "mutual information"), chi-square, and odds ratio, are based on binary information indicating the presence/absence of the feature (or "term") in each training document. As such, these methods do not exploit a rich source of information, namely, the information concerning how frequently the feature occurs in the training document (term frequency). In order to overcome this drawback, when doing feature selection we logically break down each training document of length k into k training "micro-documents", each consisting of a single word occurrence and endowed with the same class information of the original training document. This move has the double effect of (a) allowing all the original feature selection methods based on binary information to be still straightforwardly applicable, and (b) making them sensitive to term frequency information. We study the impact of this strategy in the case of ordinal text classification, a type of text classification dealing with classes lying on an ordinal scale, and recently made popular by applications in customer relationship management, market research, and Web 2.0 mining. We run experiments using four recently introduced feature selection functions, two learning methods of the support vector machines family, and two large datasets of product reviews. The experiments show that the use of this strategy substantially improves the accuracy of ordinal text classification.

机译：用于文本分类的最流行的特征选择方法，例如信息增益（也称为“互信息”），卡方和比值比，是基于表示特征中是否存在特征（或“项”）的二进制信息。每个培训文件。因此，这些方法没有利用丰富的信息源，即有关特征在训练文档中出现的频率（术语频率）的信息。为了克服此缺点，在进行特征选择时，我们将每个长度为k的训练文档逻辑上分解为k个训练“微型文档”，每个微型文档由一个单词出现组成，并赋予原始训练文档相同的类信息。此举具有双重作用：（a）允许所有直接使用基于二进制信息的原始特征选择方法，以及（b）使它们对术语频率信息敏感。我们研究了这种策略在有序文本分类（一种处理有序规模的类的文本分类）情况下的影响，并且最近在客户关系管理，市场研究和Web 2.0挖掘中受到了广泛的应用。我们使用四个最近引入的特征选择功能，两种支持向量机系列的学习方法以及两个大型产品评论数据集进行实验。实验表明，这种策略的使用大大提高了序数文本分类的准确性。

著录项

来源
《Expert Systems with Application》 |2013年第11期|4687-4696|共10页
作者
Stefano Baccianella; Andrea Esuli; Fabrizio Sebastiani;
展开▼
作者单位

Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche. 56124 Pisa. Italy;

Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche. 56124 Pisa. Italy;

Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche. 56124 Pisa. Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
text classification; supervised learning; ordinal regression; feature selection;

机译：文字分类监督学习;序数回归功能选择;

相似文献

外文文献
中文文献
专利

1. Feature Selection for Ordinal Text Classification [J] . Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani Neural computation . 2014,第3期

机译：序数文字分类的特征选择
2. A Two-stage Text Feature Selection Algorithm for Improving Text Classification [J] . Ashokkumar P., Shankar Siva G., Srivastava Gautam, ACM transactions on Asian and low-resource language information processing . 2021,第3期

机译：改进文本分类的两级文本特征选择算法
3. The Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text Classification [J] . Sung-Sam Hong, Wanhee Lee, Myung-Mook Han International Journal of Advances in Soft Computing and Its Applications . 2015,第1aSpecial期

机译：基于遗传算法的高效文本聚类和分类的特征选择方法
4. Encoding Ordinal Features into Binary Features for Text Classification [C] . Andrea Esuli, Fabrizio Sebastiani Advances in information retrieval . 2009

机译：将序数特征编码为二进制特征以进行文本分类
5. Feature selection with applications to text classification [D] . Neu, David Joseph. 2012

机译：功能选择及其在文本分类中的应用
6. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection [O] . Taxiarchis Botsis, Michael D Nguyen, Emily Jane Woo, 2011

机译：疫苗不良事件报告系统的文本挖掘：使用信息特征选择进行医学文本分类
7. Using Micro-Documents for Feature Selection: The Case of Ordinal Text Classification [O] . Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani 2013

机译：使用微文档进行特征选择：序数文本分类案例

Using micro-documents for feature selection: The case of ordinal text classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅