首页> 外文期刊>電子情報通信学会技術研究報告. 音声. Speech >A study on language model based on kana and kanji string
【24h】

A study on language model based on kana and kanji string

机译:A study on language model based on kana and kanji string

获取原文
获取原文并翻译 | 示例
       

摘要

This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morphemic determined by morphemic analysis. To exploit stronger character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50 smaller than that of word-based model.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号