Efficient n-gram construction for text categorization using feature selection techniques

Garcia Maximiliano; Maldonado Sebastian; Vairetti Carla

首页> 外文期刊>Intelligent data analysis >Efficient n-gram construction for text categorization using feature selection techniques

【24h】

Efficient n-gram construction for text categorization using feature selection techniques

机译：使用特征选择技术的文本分类高效的n-gram结构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a novel approach for n-gram generation in text classification. The a-priori algorithm is adapted to prune word sequences by combining three feature selection techniques. Unlike the traditional two-step approach for text classification in which feature selection is performed after the n-gram construction process, our proposal performs an embedded feature elimination during the application of the a-priori algorithm. The proposed strategy reduces the number of branches to be explored, speeding up the process and making the construction of all the word sequences tractable. Our proposal has the additional advantage of constructing a low-dimensional dataset with only the features that are relevant for classification, that can be used directly without the need for a feature selection step. Experiments on text classification datasets for sentiment analysis demonstrate that our approach yields the best predictive performance when compared with other feature selection approaches, while also facilitating a better understanding of the words and phrases that explain a given task; in our case online reviews and ratings in various domains.

机译：在本文中，我们在文本分类中提出了一种新的N-GRAM生成方法。 a-priori算法通过组合三个特征选择技术来调整为修剪词序列。与在N-GRAM施工过程之后执行特征选择的传统的两步方法不同，我们的提议在应用A-Priori算法期间执行嵌入功能消除。拟议的策略减少了要探索的分支机构的数量，加速过程并制定易遗传的所有单词序列的构建。我们的提议具有构建低维数据集的额外优势，只有与分类相关的功能，可以直接使用，而无需特征选择步骤。关于情绪分析的文本分类数据集的实验表明，与其他特征选择方法相比，我们的方法会产生最佳的预测性能，同时还促进了解解释给定任务的单词和短语;在我们的案例中，各个领域的在线评论和评级。

著录项

来源
《Intelligent data analysis》 |2021年第3期|509-525|共17页
作者
Garcia Maximiliano; Maldonado Sebastian; Vairetti Carla;
展开▼
作者单位

Univ Los Andes Santiago Chile;

Univ Chile Sch Econ & Business Dept Management Control & Informat Syst Santiago Chile|Inst Sistemas Complejos Ingn ISCI Santiago Chile;

Univ Los Andes Santiago Chile|Inst Sistemas Complejos Ingn ISCI Santiago Chile;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature selection; text categorization; n-gram construction; text classification; sentiment analysis;

机译：特征选择;文本分类;n-gram结构;文本分类;情感分析;

相似文献

外文文献
中文文献
专利

1. Feature Selection for Efficient Text Categorization and Knowledge Discovery Using Classification Techniques [J] . A. Christy, P. Thambidurai Asian Journal of Information Technology . 2006,第8期

机译：使用分类技术进行高效文本分类和知识发现的特征选择
2. Experimental Investigation for Text Categorization Based on Hybrid Approach Using Feature Selection and Classification Techniques [J] . K. Sridharan, M. Chitra Asian Journal of Information Technology . 2016,第14期

机译：基于特征选择和分类技术混合方法的文本分类实验研究
3. Text Categorization of Heart, Lung, and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features: [J] . Mindy K. Ross, Ko-Wei Lin, Karen Truong, Biomedical Informatics Insights . 2013,第1期

机译：利用n-gram和元数据功能对基因型和表型（dbGaP）数据库中的心脏，肺和血液研究进行文本分类：
4. Effects of various preprocessing techniques to Turkish text categorization using n-gram features [C] . Ayça Deniz, Hakan Ezgi Kiziloz 2017 International Conference on Computer Science and Engineering . 2017

机译：使用n-gram特征的各种预处理技术对土耳其语文本分类的影响
5. Study of feature selection algorithms for text-categorization. [D] . Dave, Kandarp. 2011

机译：用于文本分类的特征选择算法的研究。
6. Text Categorization of Heart Lung and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features [O] . Mindy K. Ross, Ko-Wei Lin, Karen Truong, 2013

机译：利用n-gram和元数据特征对基因型和表型（dbGaP）数据库中的心脏肺和血液研究进行文本分类
7. N-GRAM AND KLD BASED EFFICIENT FEATURE SELECTION APPROACH FOR TEXT CATEGORIZATION [O] . 2017

机译：基于N-GRAM和KLD的文本分类的高效特征选择方法

Efficient n-gram construction for text categorization using feature selection techniques

摘要

著录项

相似文献

相关主题

期刊订阅