Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification

Víctor Carrera-Trejo; Grigori Sidorov; Sabino Miranda-Jiménez; Marco Moreno Ibarra; Rodrigo Cadena Martínez

首页> 外文期刊>International Journal of Combinatorial Optimization Problems and Informatics >Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification

【24h】

Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification

机译：向量空间模型中的潜在Dirichlet分配补码，用于多标签文本分类

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In text classification task one of the main problems is to choose which features give the best results. Various features can be used like words, n-grams, syntactic n-grams of various types (POS tags, dependency relations, mixed, etc.), or a combinations of these features can be considered. Also, algorithms for dimensionality reduction of these sets of features can be applied, like Latent Dirichlet Allocation (LDA). In this paper, we consider multi-label text classification task and apply various feature sets. We consider a subset of multi-labeled files from the Reuters-21578 corpus. We use traditional tf-IDF values of the features and tried both considering and ignoring stop words. We also tried several combinations of features, like bigrams and unigrams. We also experimented with adding LDA results into Vector Space Models as new features. These last experiments obtained the best results.

机译：在文本分类任务中，主要问题之一是选择哪些功能可以提供最佳结果。可以使用各种特征，例如单词，n-gram，各种类型的语法n-gram（POS标签，依赖关系，混合等），或者可以考虑这些特征的组合。同样，可以应用这些特征集的降维算法，例如潜在狄利克雷分配（LDA）。在本文中，我们考虑了多标签文本分类任务并应用了各种功能集。我们考虑来自Reuters-21578语料库的多标签文件的子集。我们使用功能的传统tf-IDF值，并尝试考虑和忽略停用词。我们还尝试了多种功能组合，例如双字母组和字母组合。我们还尝试了将LDA结果添加到向量空间模型中作为新功能。这些最后的实验获得了最佳结果。

著录项

来源
《International Journal of Combinatorial Optimization Problems and Informatics》 |2015年第1期|共13页
作者
Víctor Carrera-Trejo; Grigori Sidorov; Sabino Miranda-Jiménez; Marco Moreno Ibarra; Rodrigo Cadena Martínez;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类算法理论;
关键词

相似文献

外文文献
中文文献
专利

1. Cardiology record multi-label classification using latent Dirichlet allocation [J] . Jorge Pérez, Alicia Pérez, Arantza Casillas, Computer Methods and Programs in Biomedicine: An International Journal Devoted to the Development, Implementation and Exchange of Computing Methodology and Software Systems in Biomedical Research and Medical Practice . 2018,第期

机译：心脏病学记录使用潜在Dirichlet分配的多标签分类
2. Arabic Text Classification Framework Based on Latent Dirichlet Allocation [J] . Ayadi Rami, Maraoui Mohsen, Mars Mourad, Journal of computing and information technology . 2012,第2期

机译：基于潜在狄利克雷分配的阿拉伯文本分类框架
3. Arabic Text Classification Framework Based on Latent Dirichlet Allocation [J] . Mounir Zrigui, Rami Ayadi, Mourad Mars, Journal of Computing and Information Technology . 2012,第2期

机译：基于潜在狄利克雷分配的阿拉伯文本分类框架
4. Semi-supervised Latent Dirichlet Allocation for Multi-label Text Classification [C] . Youwei Lu, Shogo Okada, Katsumi Nitta Recent trends in applied artificial intelligence . 2013

机译：多标签文本分类的半监督潜在狄利克雷分配
5. Text Processing for the Effective Application of Latent Dirichlet Allocation [D] . Schofield, Alexandra Kathryn. 2019

机译：有效应用潜在Dirichlet分配的文本处理
6. Leveraging Latent Dirichlet Allocation in processing free-text personal goals among patients undergoing bladder cancersurgery [O] . Yuelin Li, Bruce Rapkin, Thomas M. Atkinson, -1

机译：利用潜在的Dirichlet分配来处理患有膀胱癌的患者的自由文本个人目标手术
7. A text classification model constructed by Latent Dirichlet Allocation and Deep Learning [O] . Yu Liu, Zhengping Jin 2015

机译：由潜在Dirichlet分配和深度学习构建的文本分类模型

Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅