...
首页> 外文期刊>Neurocomputing >Improving text classification with weighted word embeddings via a multi-channel TextCNN model
【24h】

Improving text classification with weighted word embeddings via a multi-channel TextCNN model

机译:通过多通道TextCNN模型,通过加权词嵌入来改善文本分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In recent years, convolutional neural networks (CNNs) have gained considerable attention in text classification because of the remarkable good performance they achieved in various situations. The usual practice is to first perform word embedding (i.e., mapping each word into a word vector), and then employ a CNN to perform classification. To improve classification accuracy, term weighting approaches have been proven to be quite effective. But to the best of our knowledge, almost all these methods assign only one weight to each term (word). Considering the fact that one term generally has different importance in documents with different class labels, we propose in this paper a novel term weighting scheme to be combined with word embeddings to enhance the classification performance of CNNs. In the novel method, multiple weights are assigned to each term and these weights are applied to the word embeddings of the words separately. Subsequently, the transformed features are fed into a multi-channel CNN model to predict the label of the sentence. By comparing the novel method with several other baseline methods with five benchmark data sets, the results manifest that the classification accuracy of the proposed method exceeds that of other methods by an amazing margin. Moreover, the weights assigned by different weighting schemes are also analyzed to get more insights of their working mechanism. (C) 2019 Elsevier B.V. All rights reserved.
机译:近年来,卷积神经网络(CNN)在文本分类中受到了广泛的关注,因为它们在各种情况下均具有出色的性能。通常的做法是先执行单词嵌入(即,将每个单词映射到单词向量中),然后使用CNN进行分类。为了提高分类的准确性,术语加权方法已被证明是非常有效的。但是据我们所知,几乎所有这些方法都只为每个术语(单词)分配一个权重。考虑到一个术语在具有不同类别标签的文档中通常具有不同的重要性这一事实,我们在本文中提出了一种新颖的术语加权方案,将其与单词嵌入相结合以增强CNN的分类性能。在该新颖方法中,将多个权重分配给每个术语,并将这些权重分别应用于单词的单词嵌入。随后,将经过转换的特征输入多通道CNN模型中,以预测句子的标签。通过将新方法与具有五个基准数据集的其他几种基线方法进行比较,结果表明,该方法的分类准确性比其他方法的分类准确性高出惊人。此外,还分析了不同加权方案分配的权重,以更深入地了解其工作机制。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号