首页> 外文会议>Spoken dialogue systems for ambient environments >Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach
【24h】

Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach

机译:使用噪声通道方法的基于序列的语音建模

获取原文
获取原文并翻译 | 示例

摘要

Previous approaches to spontaneous speech recognition address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to model the sequence-based pronunciation variation using a noisy-channel approach where the spontaneous phoneme sequence is considered as a "noisy" string and the goal is to recover the "clean" string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy-channel approach will map from the phoneme to the word level. Our experiments use Switchboard as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The best system achieves up to 38.9% relative improvement to the baseline speech recognition.
机译:自发语音识别的先前方法通过对音素到音素级别的发音变化建模来解决多重发音问题。但是,尚未考虑由整个句子的发音引起的语音转换效果。在本文中,我们尝试使用噪声通道方法对基于序列的发音变化进行建模,其中自发音素序列被视为“噪声”字符串,目标是恢复单词序列的“纯净”字符串。因此,将考虑整个单词序列及其对音素交替的影响。此外,该系统不仅学习音素转换,而且还学习从音素到单词的直接映射。在这项初步研究中,首先将使用当前的识别系统识别音素,然后基于噪声通道方法的语音变化模型将从音素映射到单词级别。我们的实验使用Switchboard作为自发的语音语料库。结果表明,与传统的识别系统相比,该方法能够不断提高词的准确性。最佳系统相对于基准语音识别,可实现高达38.9%的相对改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号