首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task
【24h】

Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task

机译:增强的自发匈牙利语音识别能力—少资源任务的形态学和声学建模技术

获取原文
获取原文并翻译 | 示例

摘要

Various morphological and acoustic modeling techniques are evaluated on a less resourced, spontaneous Hungarian large-vocabulary continuous speech recognition (LVCSR) task. Among morphologically rich languages, Hungarian is known for its agglutinative, inflective nature that increases the data sparseness caused by a relatively small training database. Although Hungarian spelling is considered as simple phonological, a large part of the corpus is covered by words pronounced in multiple, phonemically different ways. Data-driven and language specific knowledge supported vocabulary decomposition methods are investigated in combination with phoneme- and grapheme-based acoustic modeling techniques on the given task. Word baseline and morph-based advanced baseline results are significantly outperformed by using both statistical and grammatical vocabulary decomposition methods. Although the discussed morph-based techniques recognize a significant amount of out of vocabulary words, the improvements are due not to this fact but to the reduction of insertion errors. Applying grapheme-based acoustic models instead of phoneme-based models causes no severe recognition performance deteriorations. Moreover, a fully data-driven acoustic modeling technique along with a statistical morphological modeling approach provides the best performance on the most difficult test set. The overall best speech recognition performance is obtained by using a novel word to morph decomposition technique that combines grammatical and unsupervised statistical segmentation algorithms. The improvement achieved by the proposed technique is stable across acoustic modeling approaches and larger with speaker adaptation.
机译:在资源较少,自发的匈牙利大词汇量连续语音识别(LVCSR)任务上评估了各种形态和声学建模技术。在形态丰富的语言中,匈牙利语以其具有凝集性,屈折性而著称,它增加了由相对较小的培训数据库引起的数据稀疏性。尽管匈牙利语的拼写被认为是简单的语音,但是大部分语料库都被以多种音素不同方式发音的单词所覆盖。结合给定任务的基于音素和字素的声学建模技术,研究了数据驱动的和特定于语言的知识支持的词汇分解方法。通过使用统计和语法词汇分解方法,字基线和基于词素的高级基线结果将大大优于后者。尽管所讨论的基于词素的技术可以识别大量的词汇,但这种改进并不是由于这一事实,而是由于插入错误的减少。应用基于字素的声学模型而不是基于音素的模型不会导致严重的识别性能下降。此外,完全数据驱动的声学建模技术以及统计形态建模方法可在最困难的测试集上提供最佳性能。通过使用结合了语法和无监督统计分割算法的新型单词变形分解技术,可以获得总体最佳的语音识别性能。通过提出的技术实现的改进在声学建模方法中是稳定的,并且在说话者自适应的情况下更大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号