...
首页> 外文期刊>International Journal of Combinatorial Optimization Problems and Informatics >Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification
【24h】

Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification

机译:向量空间模型中的潜在Dirichlet分配补码,用于多标签文本分类

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In text classification task one of the main problems is to choose which features give the best results. Various features can be used like words, n-grams, syntactic n-grams of various types (POS tags, dependency relations, mixed, etc.), or a combinations of these features can be considered. Also, algorithms for dimensionality reduction of these sets of features can be applied, like Latent Dirichlet Allocation (LDA). In this paper, we consider multi-label text classification task and apply various feature sets. We consider a subset of multi-labeled files from the Reuters-21578 corpus. We use traditional tf-IDF values of the features and tried both considering and ignoring stop words. We also tried several combinations of features, like bigrams and unigrams. We also experimented with adding LDA results into Vector Space Models as new features. These last experiments obtained the best results.
机译:在文本分类任务中,主要问题之一是选择哪些功能可以提供最佳结果。可以使用各种特征,例如单词,n-gram,各种类型的语法n-gram(POS标签,依赖关系,混合等),或者可以考虑这些特征的组合。同样,可以应用这些特征集的降维算法,例如潜在狄利克雷分配(LDA)。在本文中,我们考虑了多标签文本分类任务并应用了各种功能集。我们考虑来自Reuters-21578语料库的多标签文件的子集。我们使用功能的传统tf-IDF值,并尝试考虑和忽略停用词。我们还尝试了多种功能组合,例如双字母组和字母组合。我们还尝试了将LDA结果添加到向量空间模型中作为新功能。这些最后的实验获得了最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号