首页> 外国专利> CHINESE WORD SEGMENTATION METHOD AND APPARATUS BASED ON DEEP LEARNING, AND STORAGE MEDIUM AND COMPUTER DEVICE

CHINESE WORD SEGMENTATION METHOD AND APPARATUS BASED ON DEEP LEARNING, AND STORAGE MEDIUM AND COMPUTER DEVICE

机译:基于深度学习,存储介质和计算机设备的中文分词方法和装置

摘要

A Chinese word segmentation method and apparatus based on deep learning. The method comprises: converting training corpus data into character-level data; converting the character-level data into sequence data; segmenting the sequence data according to pre-set symbols to obtain a plurality of pieces of sub-sequence data, and grouping the plurality of pieces of sub-sequence data according to the lengths of the sub-sequence data to obtain K data sets; according to the K data sets, obtaining K trained time sequence convolutional neural network-conditional random field models; and inputting data obtained after the processing of target corpus data into at least one of the K trained time sequence convolutional neural network-conditional random field models to obtain a word segmentation result for the target corpus data. Therefore, the method can solve the problem of the low accuracy of Chinese word segmentation in the prior art.
机译:基于深度学习的中文分词方法及装置。该方法包括:将训练语料库数据转换为字符级数据;将字符级数据转换为序列数据;根据预设符号对序列数据进行分割,得到多个子序列数据,根据子序列数据的长度对多个子序列数据进行分组,得到K个数据集;根据K个数据集,获得K个训练的时序卷积神经网络-条件随机场模型;将处理完目标语料库数据后得到的数据输入K个训练时序卷积神经网络-条件随机场模型中的至少一个,以获得目标语料库数据的分词结果。因此,该方法可以解决现有技术中中文分词精度低的问题。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号