首页> 外文期刊>International journal of technology and human interaction >Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model
【24h】

Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model

机译:基于双向GRU-CRF和CNN网络模型的中文分词学习

获取原文
获取原文并翻译 | 示例
           

摘要

Chinese word segmentation is the basis of the Chinese natural language processing (NLP). With the development of the deep learning, various neural network models are applied to the Chinese word segmentation. However, current neural network models have the characteristics of artificial feature extraction, nonstandard word-weight, inability to effectively use long-distance information and long training time of models in Chinese word segmentation. To solve a series of problems, this article presents a CNN-Bidirectional GRU-CRF neural network model (CNN Bidirectional GRU CRF Network, CBiGCN), which breaks through the limit of conventional method window, truly realizes end-to-end processing and applies to the neural network model by the five-Tag set method, bias-variable-weight greedy strategy and supplements by Goldstein-Armijo guidelines. Besides, this model, with simple structure, is easy to be operated. And it can automatically learn features, reduces large amounts of tasks on specific knowledge in the form of handcrafted features and data pre-processing, makes use of context information effectively. The authors set an experiment with two data corpuses for Chinese word segmentation to evaluate their system. The experiment verified their new model can obtain better Chinese word segmentation results and greatly reduce training time.
机译:中文分词是中文自然语言处理(NLP)的基础。随着深度学习的发展,各种神经网络模型被应用于中文分词。然而,当前的神经网络模型具有人工特征提取,非标准单词权重,不能有效利用长途信息,模型中文训练时间长的特点。为了解决一系列问题,本文提出了一种CNN双向GRU-CRF神经网络模型(CNN Bidirectional GRU CRF Network,CBiGCN),该模型突破了传统方法窗口的局限性,真正实现了端到端处理并应用通过五标签集方法,偏差可变权重贪婪策略和Goldstein-Armijo准则的补充来建立神经网络模型。此外,该模型结构简单,易于操作。它可以自动学习功能,以手工制作的功能和数据预处理的形式减少针对特定知识的大量任务,有效利用上下文信息。作者设置了两个数据语料库进行中文分词的实验,以评估其系统。实验证明,他们的新模型可以获得较好的中文分词效果,并大大减少了训练时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号