首页> 中文期刊> 《中国邮电高校学报:英文版 》 >Sentence segmentation for classical Chinese based on LSTM with radical embedding

Sentence segmentation for classical Chinese based on LSTM with radical embedding

         

摘要

A low-than character feature embedding called radical embedding is proposed,and applied on a long-short term memory(LSTM) model for sentence segmentation of pre-modern Chinese texts.The dataset includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles.LSTM-conditional random fields(LSTM-CRF) model is a state-of-the-art method for the sequence labeling problem.This model adds a component of radical embedding,which leads to improved performances.Experimental results based on the aforementioned Chinese books demonstrate better accuracy than earlier methods on sentence segmentation,especial in Tang’s epitaph texts(achieving an F1-score of 81.34%).

著录项

  • 来源
    《中国邮电高校学报:英文版 》 |2019年第2期|1-8|共8页
  • 作者单位

    School of software Engineering;

    Beijing University of Posts and Telecommunications;

    Beijing 100876;

    China;

    Key Laboratory of Trustworthy Distributed Computing and Service;

    (BUPT);

    Ministry of Education;

    Beijing 100876;

    China;

    The Key Laboratory of Rich-Media Knowledge Organization and Service of Digital Publishing Content;

    Institute of Scientific and Technical Information of China;

    Beijing 100038;

    China;

    Insitute of Quantitative Social Science;

    Harvard University;

    Cambridge;

    MA;

    USA;

    Department of statistics;

    Harvard University;

    Cambridge;

    MA;

    USA;

  • 原文格式 PDF
  • 正文语种 chi
  • 中图分类 无线电电子学、电信技术 ;
  • 关键词

    LSTM; radical; embedding; sentence; segmentation;

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号