A study on language model based on kana and kanji string

Hiroaki Kinno; Masaharu Katoh; Tetsuo KosakaMasaki KohdaAkinori Ito

首页> 外文期刊>電子情報通信学会技術研究報告. 音声. Speech >A study on language model based on kana and kanji string

【24h】

A study on language model based on kana and kanji string

机译：A study on language model based on kana and kanji string

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

This paper describes a character-based n-gram language model. The proposed model is based on Kanji and Kana character instead of word or morphemic determined by morphemic analysis. To exploit stronger character strings are used in addition to single characters as basic units of the model. We examined two methods to choose character strings. One method is based on frequency in the training corpus, and the other is based on mutual information as well as the frequency. We carried out experiments to compare perplexities and character error rates (CER) between the proposed model and conventional (word or character based) n-gram model. The results showed that the mutual information based method gave the better performance. Although the proposed model was not superior to the word-based model, it was better than the character-based one. The vocabulary size of the proposed model was about 50 smaller than that of word-based model.

著录项

来源
《電子情報通信学会技術研究報告. 音声. Speech》 |2002年第530期|1-6|共6页
作者
Hiroaki Kinno; Masaharu Katoh; Tetsuo KosakaMasaki KohdaAkinori Ito;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种日语
中图分类电报、传真;
关键词
Morphemic analysis; Language model; Character string; Frequency; Mutual information;
入库时间 2024-01-25 19:47:13

A study on language model based on kana and kanji string

摘要

著录项

相关主题

期刊订阅