首页> 中文期刊> 《计算机技术与发展》 >基于循环神经网络序列标注的中文分词研究

基于循环神经网络序列标注的中文分词研究

         

摘要

分词是中文自然语言处理中的关键技术.在自然语言处理中,序列标注在中文分词中有着极其重要的应用.当前主流的中文分词方法是基于监督学习,从中文文本中提取特征信息.这些方法未能充分地利用上下文信息对中文进行分割,缺乏长距离信息约束能力.针对上述问题进行研究,提出在序列标注的前提下利用双向循环神经网络模型进行中文分词,避免了窗口对上下文大小的限制,可以获得一个词的前面和后面的上下文信息,通过增加上下文能够有效地解决梯度爆炸和爆的问题,然后再在输入层加入训练好的上下文词向量,取得相对较好的分词效果.实验结果表明,该算法的使用可以达到97.3%的中文分词准确率,与传统机器学习分词算法相比,效果较为显著.%Word segmentation is a key technology in Chinese natural language processing. In natural language processing,sequence labe-ling plays an important role in Chinese word segmentation. The current mainstream Chinese word segmentation method is based on super-vised learning,extraction of feature information from the Chinese text. However,they cannot make full use of context information to seg-ment Chinese,and lack of long-distance information constraint. In order to solve it,Chinese word segmentation is carried on based on bi-directional recurrent neural network model on the premise of sequence labeling,avoiding the limitation of window size on context,obtai-ning the context information of the front and back of a word. It can effectively solve the problem of gradient explosion and explosion by adding context information,and then add a good context vector in the input layer to obtain a relatively good word segmentation effect. The experimental results show that it can achieve 97. 3% accuracy of Chinese word segmentation and is superior to the traditional ma-chine learning segmentation algorithm in the effect.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号