Latent Semantic Analysis Boosted Convolutional Neural Networks for Document Classification

机译：潜在语义分析促进卷积神经网络进行文档分类

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Convolutional neural networks (CNNs) have been shown to be effective in document classification tasks. CNNs can be setup using various architectures with many different parameter settings, which may make them difficult to implement. For many document classification tasks, data transformed with ngrams (typically using uni, bi, and trigrams) and term-frequency inverse-document-frequency (TFIDF) weighting are still considered effective baseline models when used with linear classifiers such as logistic regression, especially in smaller datasets with less than 500K observations. A parsimonious CNN baseline model for sentiment classification should replicate the easy use of linear methods. In this study, we introduce a Latent Semantic Analysis (LSA) based CNN model, in which natively trained LSA word vectors are used as input into parallel 1-dimensional convolutional layers (1D-CNNs). The LSA word vector model is obtained by applying singular value decomposition (SVD) on the data transformed by a unigram and TFIDF weighting. Thus, the convolutional layers are designed with window sizes that are best suited for LSA word vectors. This parsimonious LSA -based CNN model exceeds the accuracy of all linear classifiers utilizing ngrams with TFIDF on all analyzed datasets, with average improvement of 0.73% by the top performing LSA-based CNN models. This may be due to the fact that CNNs are better adept at capturing word relationships in phrases and sentences that are not necessarily in the training corpus. Furthermore, the LSA-based CNN model exceeds the performance of word2vec-based CNN models as well. Thus, the success of LSA-based CNNs may potentiate their use as a baseline in classification tasks alongside linear models. Also, we provide guiding principles to simplify the application of LSA-based CNNs in document classification tasks.

机译：卷积神经网络（CNN）已被证明在文档分类任务中是有效的。可以使用具有许多不同参数设置的各种体系结构来设置CNN，这可能会使它们难以实现。对于许多文档分类任务，当与线性分类器（例如逻辑回归）一起使用时，使用ngram（通常使用uni，bi和trigram）和项频逆文档频率（TFIDF）权重转换的数据仍被视为有效的基线模型。在少于500K观测值的较小数据集中。用于情感分类的简约CNN基线模型应复制线性方法的简便用法。在这项研究中，我们介绍了一种基于潜在语义分析（LSA）的CNN模型，其中使用经过本地训练的LSA词向量作为并行一维卷积层（1D-CNN）的输入。通过将奇异值分解（SVD）应用于通过单字组合和TFIDF加权的数据，可以获得LSA字向量模型。因此，将卷积层设计为最适合LSA字向量的窗口大小。这种基于LSA的简约CNN模型在所有分析的数据集上均利用ngram和TFIDF超越了所有线性分类器的准确性，性能最高的基于LSA的CNN模型平均提高了0.73％。这可能是由于CNN更擅长捕获训练语料库中不一定包含的短语和句子中的单词关系。此外，基于LSA的CNN模型也超过了基于word2vec的CNN模型的性能。因此，基于LSA的CNN的成功可能会增强它们在线性模型之外的分类任务中作为基准的使用。此外，我们提供了指导原则，以简化基于LSA的CNN在文档分类任务中的应用。

著录项

来源
《International Conference on Behavioral, Economic, and Socio-Cultural Computing》|2018年|93-98|共6页
会议地点 Kaohsiung(CN)
作者
Eren Gultepe; Mehran Kamkarhaghighi; Masoud Makrehchi;
展开▼
作者单位

Computer and Software Engineering University of Ontario Institute of Technology Department of Electrical Oshawa ON Canada;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Task analysis; Biological system modeling; Vocabulary; Neural networks; Semantics; Analytical models;

机译：训练;任务分析；生物系统建模；词汇;神经网络;语义学分析模型;

相似文献

外文文献
中文文献
专利

1. Document classification using convolutional neural networks with small window sizes and latent semantic analysis [J] . Gultepe Eren, Kamkarhaghighi Mehran, Makrehchi Masoud Web Intelligence and Agent Systems . 2020,第3期

机译：使用小窗口尺寸和潜在语义分析的卷积神经网络进行文档分类
2. Attention-enabled 3D boosted convolutional neural networks for semantic CT segmentation using deep supervision [J] . Kearney Vasant, Chan Jason W., Wang Tianqi, Physics in medicine and biology. . 2019,第13期

机译：使用深度监控，支持注意力的3D推动了语义CT分段的卷积神经网络
3. A Novel Semantic CT Segmentation Algorithm Using Boosted Attention-Aware Convolutional Neural Networks [J] . Kearney V., Chan J., Wang T., Medical Physics . 2019,第6期

机译：一种新颖的语义CT分割算法，采用增强关注感知卷积神经网络
4. Latent Semantic Analysis Boosted Convolutional Neural Networks for Document Classification [C] . Eren Gultepe, Mehran Kamkarhaghighi, Masoud Makrehchi International Conference on Behavioral, Economic and Socio-Cultural Computing . 2018

机译：潜在语义分析提升了文档分类的卷积神经网络
5. Combining Convolutional Neural Networks and Graph Neural Networks for Image Classification [D] . Trivedy, Vivek. 2021

机译：结合卷积神经网络和图形神经网络的图像分类
6. A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting [O] . Ashima Kukkar, Rajni Mohana, Anand Nayyar, 2019

机译：基于卷积神经网络和Boosting随机森林的基于深度学习的错误严重性分类新技术
7. Analysis of Convolutional Neural Networks for Document Image Classification [O] . Tensmeyer, Chris, Martinez, Tony 2017

机译：文档图像卷积神经网络分析分类

Latent Semantic Analysis Boosted Convolutional Neural Networks for Document Classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅