首页> 外文会议>International Conference on Knowledge and Smart Technology >A character-level convolutional neural network with dynamic input length for Thai text categorization
【24h】

A character-level convolutional neural network with dynamic input length for Thai text categorization

机译:具有动态输入长度的字符级卷积神经网络,用于泰语文本分类

获取原文

摘要

A Character-level Convolutional Neural Network (Char-CNN) is an efficient text categorization method. It can be used in categorization task without a word segmentation step, which is necessary by traditional method for Thai. Currently, the existing model of Char-CNN uses a fixed input length and requires cutting off exceeding characters, which may lead to a missing of important content. In this paper, we propose a new Char-CNN model with a capability to accept any length of input by employing k-max pooling before a fully connected layer. The result shows that our model outperforms a Char-CNN model with a fixed input length on Thai news categorization. Moreover, our proposed method gives a better accuracy than many word-level methods: Naive Bayes, Logistic Regression, Support Vector Machine except a word-level CNN.
机译:字符级卷积神经网络(Char-CNN)是一种有效的文本分类方法。它可以用于分类任务,而无需分词步骤,这是传统泰语方法所必需的。当前,Char-CNN的现有模型使用固定的输入长度,并且需要截断超出字符的部分,这可能会导致重要内容的丢失。在本文中,我们提出了一种新的Char-CNN模型,该模型能够通过在完全连接的层之前采用k-max池来接受任何长度的输入。结果表明,在泰国新闻分类中,我们的模型优于具有固定输入长度的Char-CNN模型。此外,我们提出的方法比许多词级方法具有更好的准确性:朴素贝叶斯,逻辑回归,支持向量机(除词级CNN之外)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号