首页> 外文会议>Chinese Lexical Semantics Workshop >A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM
【24h】

A Sentence Segmentation Method for Ancient Chinese Texts Based on NNLM

机译:基于NNLM的古代文本句子分割方法

获取原文

摘要

Most of ancient Chinese texts have no punctuations or segmentation of sentences. Recent researches on automatic ancient Chinese sentence segmentation usually resorted to sequence labelling models and utilized small data sets. In this paper, we propose a sentence segmentation method for ancient Chinese texts based on neural network language models. Experiments on large-scale corpora indicate that our method is effective and achieves a comparable result to the traditional CRF model. Implementing sentence length penalty, using larger Simplified Chinese corpora, or dividing corpora by ages can further improve performance of our model.
机译:大多数古代汉语文本没有句子的标点或分割。最近关于自动古代句子分割的研究通常采用序列标签模型,并利用小数据集。本文提出了一种基于神经网络语言模型的古代文本句子分割方法。大型语料库的实验表明我们的方法是有效的,并实现了传统CRF模型的可比结果。实施句子长度惩罚,使用较大的简体中文集团,或者逐年分割Corpora可以进一步提高我们模型的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号