首页> 外文会议>International conference on Asian language processing >Burmese Word Segmentation Method and Implementation Based on CRF
【24h】

Burmese Word Segmentation Method and Implementation Based on CRF

机译:基于CRF的缅甸语分词方法及实现

获取原文

摘要

Burmese belongs to the languages whose writing system has no delimiters to mark word boundaries. However, related works on Burmese word segmentation are still at the initial stage. This paper aims to fill the blank by employing CRF model to the task. The performance of the CRF method is evaluated with confidence, precision of segmentation. We prepared an experimental database of 5,000 sentences, which were manually segmented by Burmese experts. After the 6-fold cross-validation of the experimental data set, the experimental results show that the average confidence level of the CRF method is 93.4%, which is greater than the threshold, and the average value of the F1 is 93.0%. Therefore, the CRF segmentation method satisfies the requirements for developing a Burmese speech synthesis system.
机译:缅甸语属于其书写系统没有定界符来标记单词边界的语言。但是,有关缅甸语分词的相关工作仍处于起步阶段。本文旨在通过采用CRF模型来填补这一空白。 CRF方法的性能以置信度,分割精度进行评估。我们准备了一个包含5,000个句子的实验数据库,这些数据库由缅甸专家手动进行了细分。在对实验数据集进行六次交叉验证后,实验结果表明,CRF方法的平均置信度为93.4%,大于阈值,而F1的平均值为93.0%。因此,CRF分割方法满足了开发缅甸语音合成系统的要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号