首页> 外文期刊>International journal of computer processing of languages >Dictation of Japanese Speech Based on Kana and Kanji Character String
【24h】

Dictation of Japanese Speech Based on Kana and Kanji Character String

机译:基于假名和汉字字符串的日语语音听写

获取原文
获取原文并翻译 | 示例

摘要

In this paper, character-based Japanese dictation method is proposed. This method is based on the kana and kanji string language model proposed by Ito et al. First, sentences in the training corpus are split into character-based units (CBUs). Then strings of CBUs (CBUSes) are chosen from the CBU corpus based on a statistical criterion. We examined three criteria for the CBUS selection. They are the frequency-based selection, the mutual-information based selection and their combination. From the experimental results, it was found that the combined method gave the best result (7.19% and 8.75% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively) which was better than the ordinary word-based method (7.61% and 9.15% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively).rnIn addition, we carried out a recognition experiment for the Corpus of Spontaneous Japanese to confirm that the proposed method is effective for not only the read speech but also for spontaneous speech. As a result, we obtained the best result (29.82%) using the frequency-based method, which is better than the word-based recognition result (32.80%).
机译:本文提出了一种基于字符的日语听写方法。该方法基于Ito等人提出的假名和汉字字符串语言模型。首先,训练语料库中的句子分为基于字符的单元(CBU)。然后,根据统计标准从CBU语料库中选择CBU(CBUS)字符串。我们检查了选择CBUS的三个标准。它们是基于频率的选择,基于互信息的选择及其组合。从实验结果发现,组合方法提供了最佳结果(在20k和60k单词词汇条件下,CBU错误率分别为7.19%和8.75%),优于基于普通单词的方法(7.61%)在20k和60k单词条件下的CBU错误率分别为9.15%)。此外,我们对自发日语语料库进行了识别实验,以确认该方法不仅对阅读语音有效,而且对英语发音也有效。自发的演讲。结果,我们使用基于频率的方法获得了最佳结果(29.82%),优于基于单词的识别结果(32.80%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号