An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network

机译：卷积神经网络的文本分类的创新词编码方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification plays a vital role today especially with the intensive use of social networking media. Recently, different architectures of convolutional neural networks have been used for text classification in which one-hot vector, and word embedding methods are commonly used. This paper presents a new language independent word encoding method for text classification. The proposed model converts raw text data to low-level feature dimension with minimal or no preprocessing steps by using a new approach called binary unique number of word “BUNOW.” BUNOW allows each unique word to have an integer ID in a dictionary that is represented as a k-dimensional vector of its binary equivalent. The output vector of this encoding is fed into a convolutional neural network (CNN) model for classification. Moreover, the proposed model reduces the neural network parameters, allows faster computation with few network layers, where a word is atomic representation the document as in word level, and decrease memory consumption for character level representation. The provided CNN model is able to work with other languages or multi-lingual text without the need for any changes in the encoding method. The model outperforms the character level and very deep character level CNNs models in terms of accuracy, network parameters, and memory consumption; the results show total classification accuracy 91.99% and error 8.01% using AG's News dataset compared to the state of art methods that have total classification accuracy 91.45% and error 8.55%, in addition to the reduction in input feature vector and neural network parameters by 62% and 34%, respectively.

机译：文本分类今天起到重要作用，特别是在广泛使用社交网络媒体。最近，卷积神经网络的不同架构已经用于文本分类，其中通常使用一个热向量和单词嵌入方法。本文提出了一种新的语言独立单词编码方法，用于文本分类。该模型通过使用称为二元唯一单词“Bunow”的新方法，将原始文本数据转换为低电平特征维度，或者通过最小或没有预处理步骤。 Bunow允许每个唯一的单词在字典中具有整数ID，表示为其二进制等效物的K维向量。该编码的输出向量被馈送到卷积神经网络（CNN）模型中进行分类。此外，所提出的模型减少了神经网络参数，允许用几个网络层更快地计算，其中单词是单词级别的文档，并降低字符级表示的存储器消耗。提供的CNN模型能够使用其他语言或多语言文本，而无需对编码方法的任何更改。在准确性，网络参数和存储器消耗方面，该模型优于字符级别和非常深刻的CNNS模型;结果显示了AG的新闻数据集的总分类准确性91.99 ％和错误8.01 ％与具有总分类准确性的最新状态，除了输入特征向量和神经的减少外，还使用总分类精度的最新技术和错误8.55 ％。网络参数分别为62 ％和34 ％。

著录项

来源
《International Computer Engineering Conference》|2018年|vii 265 p. :|共6页
会议地点
作者
Amr Adel Helmy; Yasser M.K. Omar; Rania Hodhod;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Text categorization; Feature extraction; Computational modeling; Vocabulary; Encoding; Neural networks; Computer architecture;

机译：文本分类;特征提取;计算建模;词汇;编码;神经网络;计算机架构;

相似文献

外文文献
中文文献
专利

1. Text classification based on word2vec and convolutional neural networks [J] . Basic & clinical pharmacology & toxicology. . 2019,第S10期

机译：基于Word2VEC和卷积神经网络的文本分类
2. Text classification based on word2vec and convolutional neural networks [J] . Fan Xiaojing, Jiang Mingyang, Pei Zhili Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：基于Word2VEC和卷积神经网络的文本分类
3. Comparative Study of Convolutional Neural Network with Word Embedding Technique for Text Classification [J] . Amol C. Adamuthe, Sneha Jagtap International Journal of Intelligent Systems and Applications . 2019,第8期

机译：卷积神经网络与词嵌入技术在文本分类中的比较研究
4. An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network [C] . Amr Adel Helmy, Yasser M.K. Omar, Rania Hodhod International Computer Engineering Conference . 2018

机译：卷积神经网络的文本分类新词编码方法
5. Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks [D] . Hassan, Abdalraouf. 2018

机译：基于卷积神经网络和递归神经网络的深度神经语言文本分类模型
6. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks [O] . Liang Yao, Chengsheng Mao, Yuan Luo 2019

机译：具有基于规则的功能和知识导向的卷积神经网络的临床文本分类
7. Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya [O] . Awet Fesseha, Shengwu Xiong, Eshete Derb Emiru, 2021

机译：基于卷积神经网络的文本分类和低资源语言的Word嵌入：Tigrinya

An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network

摘要

著录项

相似文献

相关主题

期刊订阅