首页> 外文会议>International conference on computational linguistics >Automatic Syllabification for Manipuri language
【24h】

Automatic Syllabification for Manipuri language

机译:Manipuri语言的自动音节化

获取原文

摘要

Development of hand crafted rule for syllabifying words of a language is an expensive task. This paper proposes several data-driven methods for automatic syllabification of words written in Manipuri language. Manipuri is one of the scheduled Indian languages. First, we propose a language-independent rule-based approach formulated using entropy based phonotactic segmentation. Second, we project the syllabification problem as a sequence labeling problem and investigate its effect using various sequence labeling approaches. Third, we combine the effect of sequence labeling and rule-based method and investigate the performance of the hybrid approach. From various experimental observations, it is evident that the proposed methods outperform the baseline rule-based method. The entropy based phonotactic segmentation provides a word accuracy of 96%, CRF (sequence labeling approach) provides 97% and hybrid approach provides 98% word accuracy.
机译:开发用于将语言的单词音节化的手工规则是一项昂贵的任务。本文提出了几种数据驱动的方法,用于对以Manipuri语言编写的单词进行自动音节化。 Manipuri是预定的印度语言之一。首先,我们提出了一种基于语言的基于规则的方法,该方法使用了基于熵的音位分割方法。其次,我们将音节化问题投影为序列标记问题,并使用各种序列标记方法研究其影响。第三,我们结合了序列标记和基于规则的方法的效果,并研究了混合方法的性能。从各种实验观察中,很明显,所提出的方法优于基于基线规则的方法。基于熵的音符分割提供了96%的单词准确度,CRF(序列标记方法)提供了97%的单词准确度,而混合方法提供了98%的单词准确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号