首页> 中文期刊>计算机应用 >结合注意力机制的长文本分类方法

结合注意力机制的长文本分类方法

     

摘要

News text usually consists of tens to hundreds of sentences,which has a large number of characters and contains more information that is not relevant to the topic,affecting the classification performance.In view of the problem,a long text classification method combined with attention mechanism was proposed.Firstly,a sentence was represented by a paragraph vector,and then a neural network attention model of paragraph vectors and text categories was constructed to calculate the sentence's attention.Then the sentence was filtered according to its contribution to the category,which value was mean square error of sentence attention vector.Finally,a classifier base on Convolutional Neural Network (CNN) was constructed.The filtered text and the attention matrix were respectively taken as the network input.Max pooling was used for feature filtering.Random dropout was used to reduce over-fitting.Experiments were conducted on data set of Chinese news text classification task,which was one of the shared tasks in Natural Language Processing and Chinese Computing (NLP&CC) 2014.The proposed method achieved 80.39% in terms of accuracy for the filtered text,which length was 82.74% of the text before filtering,yielded an accuracy improvement of considerable 2.1% compared to text before filtering.The emperimental results show that combining with attention mechanism,the proposed method can improve accuracy of long text classification while achieving sentence level information filtering.%新闻文本常包含几十至几百条句子,因字符数多、包含较多与主题无关信息,影响分类性能.对此,提出了结合注意力机制的长文本分类方法.首先将文本的句子表示为段落向量,再构建段落向量与文本类别的神经网络注意力模型,用于计算句子的注意力,将句子注意力的均方差作为其对类别的贡献度,进行句子过滤,然后构建卷积神经网络(CNN)分类模型,分别将过滤后的文本及其注意力矩阵作为网络输入.模型用max pooling进行特征过滤,用随机dropout防止过拟合.实验在自然语言处理与中文计算(NLP&CC)评测2014的新闻分类数据集上进行.当过滤文本长度为过滤前文本的82.74%时,19类新闻的分类正确率为80.39%,比过滤前文本的分类正确率超出2.1%,表明结合注意力机制的句子过滤方法及分类模型,可在句子级信息过滤的同时提高长文本分类正确率.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号