首页> 外文会议>International Conference on Information and Communication Technology >Indonesian Graphemic Syllabification Using n-Gram Tagger with State-Elimination
【24h】

Indonesian Graphemic Syllabification Using n-Gram Tagger with State-Elimination

机译:使用带有状态消除功能的n-Gram Tagger进行印度尼西亚音素化

获取原文

摘要

Syllabification can be approached using either grapheme or phoneme-based. Graphemic syllabification is simpler than phonemic syllabification since it does not require grapheme-to-phoneme conversion (G2P). Both phonemic and graphemic syllabification has been done on Indonesian words with average SER of 0.64% and 2.27%, respectively. The performance of Indonesian graphemic syllabification is considerably lower than the phonemic one. This research aims to improve Indonesian graphemic syllabification using a syllable boundary tagger based on the statistical n-gram model. Using fivefold cross-validation on 50k formal Indonesian words, the proposed model gives an average syllable error rate (SER) of 0.94% while the introduced state-elimination procedure reduces the SER to 0.92%, which is much lower than the previous Indonesian graphemic syllabification. Most syllabification errors come from derivative words and adapted foreign terms.
机译:音节可以使用基于字素或音素的方法来实现。音素音节化比音素音节化更简单,因为它不需要音素到音素转换(G2P)。音素音素和音素音素均已在印尼语单词上完成,平均SER分别为0.64%和2.27%。印尼音素音节化的表现远低于音素音节化。这项研究的目的是基于统计n-gram模型,使用音节边界标记器来改善印度尼西亚的音素音节化。通过对50k正式印尼语单词进行五重交叉验证,所提出的模型给出的平均音节错误率(SER)为0.94%,而引入的状态消除程序将SER降至0.92%,这远低于先前的印尼语音音节化。大多数音节错误来自派生词和改编的外来词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号