首页> 外文期刊>Web Intelligence and Agent Systems >Document classification using convolutional neural networks with small window sizes and latent semantic analysis
【24h】

Document classification using convolutional neural networks with small window sizes and latent semantic analysis

机译:使用小窗口尺寸和潜在语义分析的卷积神经网络进行文档分类

获取原文
获取原文并翻译 | 示例

摘要

A parsimonious convolutional neural network (CNN) for text document classification that replicates the ease of use and high classification performance of linear methods is presented. This new CNN architecture can leverage locally trained latent semantic analysis (LSA) word vectors. The architecture is based on parallel 1D convolutional layers with small window sizes, ranging from 1 to 5 words. To test the efficacy of the new CNN architecture, three balanced text datasets that are known to perform exceedingly well with linear classifiers were evaluated. Also, three additional imbalanced datasets were evaluated to gauge the robustness of the LSA vectors and small window sizes. The new CNN architecture consisting of 1 to 4-grams, coupled with LSA word vectors, exceeded the accuracy of all linear classifiers on balanced datasets with an average improvement of 0.73%. In four out of the total six datasets, the LSA word vectors provided a maximum classification performance on par with or better than word2vec vectors in CNNs. Furthermore, in four out of the six datasets, the new CNN architecture provided the highest classification performance. Thus, the new CNN architecture and LSA word vectors could be used as a baseline method for text classification tasks.
机译:提出了一种用于文本文档分类的解析卷积神经网络(CNN),其复制了线性方法的易用性和高分类性能。这种新的CNN架构可以利用当地训练的潜在语义分析(LSA)字向量。该架构基于具有小窗口尺寸的平行1D卷积层,从1到5个字范围内。为了测试新的CNN架构的功效,评估了已知具有线性分类器非常良好的三个平衡的文本数据集。此外,评估了三个额外的不平衡数据集以衡量LSA矢量的鲁棒性和小窗口尺寸。由1到4克组成的新的CNN架构与LSA字矢量组成,超过平衡数据集上所有线性分类器的精度,平均提高为0.73%。在总共六个数据集中中的四个中,LSA字向量提供了最大的分类性能,而不是CNN中的Word2Vec向量。此外,在六个数据集中的四个中,新的CNN架构提供了最高的分类性能。因此,新的CNN架构和LSA字向量可以用作文本分类任务的基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号