首页> 外文会议>International Computer Engineering Conference >An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network
【24h】

An Innovative Word Encoding Method For Text Classification Using Convolutional Neural Network

机译:卷积神经网络的文本分类的创新词编码方法

获取原文

摘要

Text classification plays a vital role today especially with the intensive use of social networking media. Recently, different architectures of convolutional neural networks have been used for text classification in which one-hot vector, and word embedding methods are commonly used. This paper presents a new language independent word encoding method for text classification. The proposed model converts raw text data to low-level feature dimension with minimal or no preprocessing steps by using a new approach called binary unique number of word “BUNOW.” BUNOW allows each unique word to have an integer ID in a dictionary that is represented as a k-dimensional vector of its binary equivalent. The output vector of this encoding is fed into a convolutional neural network (CNN) model for classification. Moreover, the proposed model reduces the neural network parameters, allows faster computation with few network layers, where a word is atomic representation the document as in word level, and decrease memory consumption for character level representation. The provided CNN model is able to work with other languages or multi-lingual text without the need for any changes in the encoding method. The model outperforms the character level and very deep character level CNNs models in terms of accuracy, network parameters, and memory consumption; the results show total classification accuracy 91.99% and error 8.01% using AG's News dataset compared to the state of art methods that have total classification accuracy 91.45% and error 8.55%, in addition to the reduction in input feature vector and neural network parameters by 62% and 34%, respectively.
机译:文本分类今天起到重要作用,特别是在广泛使用社交网络媒体。最近,卷积神经网络的不同架构已经用于文本分类,其中通常使用一个热向量和单词嵌入方法。本文提出了一种新的语言独立单词编码方法,用于文本分类。该模型通过使用称为二元唯一单词“Bunow”的新方法,将原始文本数据转换为低电平特征维度,或者通过最小或没有预处理步骤。 Bunow允许每个唯一的单词在字典中具有整数ID,表示为其二进制等效物的K维向量。该编码的输出向量被馈送到卷积神经网络(CNN)模型中进行分类。此外,所提出的模型减少了神经网络参数,允许用几个网络层更快地计算,其中单词是单词级别的文档,并降低字符级表示的存储器消耗。提供的CNN模型能够使用其他语言或多语言文本,而无需对编码方法的任何更改。在准确性,网络参数和存储器消耗方面,该模型优于字符级别和非常深刻的CNNS模型;结果显示了AG的新闻数据集的总分类准确性91.99 %和错误8.01 %与具有总分类准确性的最新状态,除了输入特征向量和神经的减少外,还使用总分类精度的最新技术和错误8.55 %。网络参数分别为62 %和34 %。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号