首页> 外文会议>IEEE Applied Imagery Pattern Recognition Workshop >End-to-End Text Classification via Image-based Embedding using Character-level Networks
【24h】

End-to-End Text Classification via Image-based Embedding using Character-level Networks

机译:通过使用字符级网络的基于图像的嵌入的端到端文本分类

获取原文

摘要

For analysing and/or understanding languages having no word boundaries based on morphological analysis such as Japanese, Chinese and Thai, it is desirable to perform appropriate word segmentation before word embeddings. But it is inherently difficult in these languages. In recent years, various language models based on deep learning have made remarkable progress, and some of these methodologies utilizing character-level features have successfully avoided such a difficult problem. However, when a model is fed character-level features of the above languages, it often causes overfitting due to a large number of character types. In this paper, we propose a CE-CLCNN, character-level convolutional neural networks using a character encoder to tackle these problems. The proposed CE-CLCNN is an end-to-end learning model and has an image-based character encoder, i.e. the CE-CLCNN handles each character in the target document as an image. Through various experiments, we found and confirmed that our CE-CLCNN captured closely embedded features for visually and semantically similar characters and achieves state-of-the-art results on several open document classification tasks. In this paper we report the performance of our CE-CLCNN with the Wikipedia title estimation task and analyse the internal behaviour.
机译:用于分析和/或理解基于形态分析的语言,如日语,中文和泰语,如日语,中文和泰语,期望在Word Embeddings之前执行适当的单词分段。但是这些语言本质上很难。近年来,基于深度学习的各种语言模型取得了显着的进展,其中一些使用字符级功能的方法已经成功避免了这么难的问题。然而,当模型是上述语言的字符级别特征时,它通常会导致由于大量字符类型而导致过度拟合。在本文中,我们提出了一种使用字符编码器来解决这些问题的CE-CLCNN,字符级卷积神经网络。所提出的CE-CLCNN是端到端学习模型,并且具有基于图像的字符编码器,即CE-CLCNN将目标文档中的每个字符作为图像处理。通过各种实验,我们发现并确认了我们的CE-CLCNN在视觉上和语义类似的角色中捕获了密切的嵌入功能,并在几个开放文档分类任务上实现了最先进的结果。在本文中,我们向Wikipedia标题估计任务报告了我们的CE-CLCNN的表现,并分析了内部行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号