首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Semantic Text Encoding for Text Classification Using Convolutional Neural Networks
【24h】

Semantic Text Encoding for Text Classification Using Convolutional Neural Networks

机译:使用卷积神经网络进行文本分类的语义文本编码

获取原文
获取外文期刊封面目录资料

摘要

In this paper, we encode semantics of a text document in an image to take advantage of the same Convolutional Neural Networks (CNNs) that have been successfully employed to image classification. We use Word2Vec, which is an estimation of word representation in a vector space that can maintain the semantic and syntactic relationships among words. Word2Vec vectors are transformed into graphical words representing sequence of words in the text document. The encoded images are classified by using the AlexNet architecture. We introduced a new dataset named Text-Ferramenta gathered from an Italian price comparison website and we evaluated the encoding scheme through this dataset along with two publicly available datasets i.e. 20news-bydate and StackOverflow. Our scheme outperforms the text classification approach based on Doc2Vec and Support Vector Machine (SVM) when all the words of a text document can be completely encoded in an image. We believe that the results on these datasets are an interesting starting point for many Natural Language Processing works based on CNNs, such as a multimodal approach that could use a single CNN to classify both image and text information.
机译:在本文中,我们利用图像中文本文档的语义进行编码,以利用已成功用于图像分类的相同卷积神经网络(CNN)。我们使用Word2Vec,它是向量空间中单词表示的一种估计,可以保持单词之间的语义和句法关系。 Word2Vec向量被转换为表示文本文档中单词序列的图形单词。编码的图像通过使用AlexNet架构进行分类。我们引入了一个从意大利价格比较网站收集的名为Text-Ferramenta的新数据集,并通过该数据集以及两个可公开获取的数据集(即20news-bydate和StackOverflow)对编码方案进行了评估。当文本文档的所有单词都可以在图像中完全编码时,我们的方案优于基于Doc2Vec和支持向量机(SVM)的文本分类方法。我们认为,这些数据集上的结果是许多基于CNN的自然语言处理工作的有趣起点,例如可以使用单个CNN对图像和文本信息进行分类的多模式方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号