...
首页> 外文期刊>Speech Communication >Korean large vocabulary continuous speech recognition with morpheme-based recognition units
【24h】

Korean large vocabulary continuous speech recognition with morpheme-based recognition units

机译:具有基于词素的识别单元的韩语大词汇量连续语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

In Korean writing, a space is placed between two adjacent word-phrases, each of which generally corresponds to two or three words in English in a semantic sense. If the word-phrase is used as a recognition unit for Korean large vocabulary continuous speech recognition (LVCSR), the out-of-vocabulary (OOV) rate becomes very large. If a morpheme or a syllable is used instead, a severe inter-morpheme coarticulation problem arises due to short morphemes. We propose to use a merged morpheme as the recognition unit and pronunciation-dependent entries in a language model (LM) so that we can reduce such difficulties and incorporate the between-word phonology rule into the decoding algorithm of a Korean LVCSR system. Starting from the original morpheme units defined in the Korean morphology, we merge pairs of short and frequent morphemes into larger units by using a rule-based method and a statistical method. We define the merged morpheme unit as word and use it as the recognition unit. The performance of the system was evaluated in two business-related tasks: a read speech recognition task and a broadcast news transcription task. The OOV rate was reduced to a level comparable to that of American English in both tasks. In the read speech recognition task, with a 32k vocabulary and a word-based trigram LM computed from a newspaper text corpus, the word error rate (WER) of the baseline system was reduced from 25.0% to 20.0% by cross-word modeling and pronunciation-dependent language modeling, and finally to 15.5% by increasing speech database and text corpora. For the broadcast news transcription task, we showed that the statistical method relatively reduced the WER of the baseline system without morpheme merging by 3.4% and both of the proposed methods yielded similar performance. Applying all the proposed techniques, we achieved 17.6% WER for clean speech and 27.7% for noisy speech.
机译:在韩文写作中,在两个相邻的词组之间放置一个空格,每个词组在语义上通常对应于英语中的两个或三个词。如果将词组用作韩语大词汇量连续语音识别(LVCSR)的识别单元,那么语音外(OOV)率将变得非常高。如果改用语素或音节,则由于语素短,会引起严重的语素间共音问题。我们建议使用合并的语素作为语言模型(LM)中的识别单元和依赖于语音的条目,以便我们可以减少此类困难并将词间音系规则纳入韩国LVCSR系统的解码算法中。从朝鲜语形态学中定义的原始语素单元开始,我们使用基于规则的方法和统计方法将成对的短期和频繁语素合并为更大的单元。我们将合并的语素单元定义为单词,并将其用作识别单元。在两个与业务相关的任务中评估了系统的性能:阅读语音识别任务和广播新闻转录任务。在这两个任务中,OOV率均降低到与美式英语相当的水平。在阅读语音识别任务中,借助32k的词汇量和根据报纸文本语料库计算出的基于单词的Trigram LM,通过交叉单词建模和分析,基线系统的单词错误率(WER)从25.0%降低至20.0%与语音相关的语言建模,最后通过增加语音数据库和文本语料库将其提高到15.5%。对于广播新闻转录任务,我们表明统计方法相对于基线系统的WER减少了3.4%,而没有词素合并,并且两种方法都产生了相似的性能。应用所有提议的技术,我们获得了干净语音的17.6%WER和嘈杂语音的27.7%WER。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号