Experiments on Character and Word Level Features for Text Classification Using Deep Neural Network

机译：基于深度神经网络的文本分类中字符和单词级别特征的实验

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification is a task to assign text documents according to its content to one or more classes automatically. Recently character-level models using deep neural networks have been developed to do classification text. Moreover, in some cases, character-level models have outperformed word-level models and traditional models, especially on user-generated dataset. The topologies that have been used for the character-level models are convolutional neural networks (CNN) and bidirectional recurrent neural networks (Bi-RNN), with its variants; long short-term memory (LSTM) and gated recurrent units (GRU). In this paper, CNN, Bi-RNN, and the combination of both are tested with character-level features and word-level features for text classification on English and Indonesian social media datasets. On small size datasets, word-level model outperformed character-level models. However, on dataset with millions of data, character-level model outperformed word-level model. Further analysis on the evaluation of word-level and character-level models is also discussed in this paper.

机译：文本分类是一项根据文本内容自动将文本文档分配给一个或多个类的任务。最近，已经开发出使用深度神经网络的字符级模型来进行分类文本。此外，在某些情况下，字符级模型的性能优于单词级模型和传统模型，尤其是在用户生成的数据集上。用于字符级模型的拓扑是卷积神经网络（CNN）和双向递归神经网络（Bi-RNN）及其变体。长短期记忆（LSTM）和门控循环单元（GRU）。本文对CNN，Bi-RNN以及两者的组合进行了测试，并使用字符级特征和单词级特征对英语和印尼社交媒体数据集进行了文本分类。在小型数据集上，单词级模型的性能优于字符级模型。但是，在具有数百万个数据的数据集上，字符级模型的性能优于单词级模型。本文还讨论了对词级和字符级模型评估的进一步分析。

著录项

来源
《International Conference on Informatics and Computing》|2018年|1-6|共6页
会议地点
作者
Muhammad Gumilang; Ayu Purwarianti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Logic gates; Twitter; Feature extraction; Text categorization; Data models; Tagging; Neural networks;

机译：逻辑门; Twitter;特征提取;文本分类;数据模型;标记;神经网络;

相似文献

外文文献
中文文献
专利

1. Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification [J] . Physica, A. Statistical mechanics and its applications . 2020,第期

机译：使用预训练的单词嵌入在土耳其语文本分类的深神经网络上使用预先训练的单词嵌入来提高准确性
2. ASCII Art Classification based on Deep Neural Networks Using Image Feature of Characters [J] . Kazuyuki Matsumoto, Akira Fujisawa, Minoru Yoshida, Journal of software . 2018,第10期

机译：利用字符图像特征的基于深度神经网络的ASCII艺术分类
3. Character-level text classification via convolutional neural network and gated recurrent unit [J] . Liu Bing, Zhou Yong, Sun Wei International journal of machine learning and cybernetics . 2020,第8期

机译：通过卷积神经网络和门控复发单元进行字符级文本分类
4. Experiments on Character and Word Level Features for Text Classification Using Deep Neural Network [C] . Muhammad Gumilang, Ayu Purwarianti International Conference on Informatics and Computing . 2018

机译：深神经网络的文本分类特征的实验
5. Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks [D] . Hassan, Abdalraouf. 2018

机译：基于卷积神经网络和递归神经网络的深度神经语言文本分类模型
6. Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method [O] . Nizar Ahmed, Fatih Dilmaç, Adil Alpkocak 2020

机译：使用加权特征表示方法对深神经网络的生物医学文本的分类
7. ASCII Art Classification based on Deep Neural Networks Using Image Feature of Characters [O] . Kazuyuki Matsumoto, Akira Fujisawa, Minoru Yoshida 2018

机译：基于使用字符图像特征的基于深神经网络的ASCII艺术分类

Experiments on Character and Word Level Features for Text Classification Using Deep Neural Network

摘要

著录项

相似文献

相关主题

期刊订阅