首页> 外文会议>International Conference on Informatics and Computing >Experiments on Character and Word Level Features for Text Classification Using Deep Neural Network
【24h】

Experiments on Character and Word Level Features for Text Classification Using Deep Neural Network

机译:深神经网络的文本分类特征的实验

获取原文

摘要

Text classification is a task to assign text documents according to its content to one or more classes automatically. Recently character-level models using deep neural networks have been developed to do classification text. Moreover, in some cases, character-level models have outperformed word-level models and traditional models, especially on user-generated dataset. The topologies that have been used for the character-level models are convolutional neural networks (CNN) and bidirectional recurrent neural networks (Bi-RNN), with its variants; long short-term memory (LSTM) and gated recurrent units (GRU). In this paper, CNN, Bi-RNN, and the combination of both are tested with character-level features and word-level features for text classification on English and Indonesian social media datasets. On small size datasets, word-level model outperformed character-level models. However, on dataset with millions of data, character-level model outperformed word-level model. Further analysis on the evaluation of word-level and character-level models is also discussed in this paper.
机译:文本分类是一个任务,可根据其内容自动将文本文档分配给一个或多个类。最近使用深神经网络的字符级模型已经开发了进行分类文本。此外,在某些情况下,字符级模型具有表现优于单词级模型和传统模型,尤其是在用户生成的数据集上。已经用于字符级模型的拓扑是卷积神经网络(CNN)和双向反复性神经网络(BI-RNN),其变体;长短期内存(LSTM)和门控复发单位(GRU)。在本文中,CNN,Bi-RNN和两者的组合都以英语和印度尼西亚社交媒体数据集上的文本分类进行了字符级功能和字级功能。在小尺寸数据集中,字级模型表现优于字符级模型。但是,在具有数百万数据的数据集上,字符级模型表现优于单词级模型。本文还讨论了对词级和字符级模型的评估的进一步分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号