首页> 外文会议>International Conference on Informatics and Computing >Experiments on Character and Word Level Features for Text Classification Using Deep Neural Network
【24h】

Experiments on Character and Word Level Features for Text Classification Using Deep Neural Network

机译:基于深度神经网络的文本分类中字符和单词级别特征的实验

获取原文

摘要

Text classification is a task to assign text documents according to its content to one or more classes automatically. Recently character-level models using deep neural networks have been developed to do classification text. Moreover, in some cases, character-level models have outperformed word-level models and traditional models, especially on user-generated dataset. The topologies that have been used for the character-level models are convolutional neural networks (CNN) and bidirectional recurrent neural networks (Bi-RNN), with its variants; long short-term memory (LSTM) and gated recurrent units (GRU). In this paper, CNN, Bi-RNN, and the combination of both are tested with character-level features and word-level features for text classification on English and Indonesian social media datasets. On small size datasets, word-level model outperformed character-level models. However, on dataset with millions of data, character-level model outperformed word-level model. Further analysis on the evaluation of word-level and character-level models is also discussed in this paper.
机译:文本分类是一项根据文本内容自动将文本文档分配给一个或多个类的任务。最近,已经开发出使用深度神经网络的字符级模型来进行分类文本。此外,在某些情况下,字符级模型的性能优于单词级模型和传统模型,尤其是在用户生成的数据集上。用于字符级模型的拓扑是卷积神经网络(CNN)和双向递归神经网络(Bi-RNN)及其变体。长短期记忆(LSTM)和门控循环单元(GRU)。本文对CNN,Bi-RNN以及两者的组合进行了测试,并使用字符级特征和单词级特征对英语和印尼社交媒体数据集进行了文本分类。在小型数据集上,单词级模型的性能优于字符级模型。但是,在具有数百万个数据的数据集上,字符级模型的性能优于单词级模型。本文还讨论了对词级和字符级模型评估的进一步分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号