首页> 外文期刊>Intelligent data analysis >Efficient n-gram construction for text categorization using feature selection techniques
【24h】

Efficient n-gram construction for text categorization using feature selection techniques

机译:使用特征选择技术的文本分类高效的n-gram结构

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present a novel approach for n-gram generation in text classification. The a-priori algorithm is adapted to prune word sequences by combining three feature selection techniques. Unlike the traditional two-step approach for text classification in which feature selection is performed after the n-gram construction process, our proposal performs an embedded feature elimination during the application of the a-priori algorithm. The proposed strategy reduces the number of branches to be explored, speeding up the process and making the construction of all the word sequences tractable. Our proposal has the additional advantage of constructing a low-dimensional dataset with only the features that are relevant for classification, that can be used directly without the need for a feature selection step. Experiments on text classification datasets for sentiment analysis demonstrate that our approach yields the best predictive performance when compared with other feature selection approaches, while also facilitating a better understanding of the words and phrases that explain a given task; in our case online reviews and ratings in various domains.
机译:在本文中,我们在文本分类中提出了一种新的N-GRAM生成方法。 a-priori算法通过组合三个特征选择技术来调整为修剪词序列。与在N-GRAM施工过程之后执行特征选择的传统的两步方法不同,我们的提议在应用A-Priori算法期间执行嵌入功能消除。拟议的策略减少了要探索的分支机构的数量,加速过程并制定易遗传的所有单词序列的构建。我们的提议具有构建低维数据集的额外优势,只有与分类相关的功能,可以直接使用,而无需特征选择步骤。关于情绪分析的文本分类数据集的实验表明,与其他特征选择方法相比,我们的方法会产生最佳的预测性能,同时还促进了解解释给定任务的单词和短语;在我们的案例中,各个领域的在线评论和评级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号