首页> 外文会议>2017 IEEE Automatic Speech Recognition and Understanding Workshop >Character-based units for unlimited vocabulary continuous speech recognition
【24h】

Character-based units for unlimited vocabulary continuous speech recognition

机译:基于字符的单位,用于无限制的词汇连续语音识别

获取原文
获取原文并翻译 | 示例

摘要

We study character-based language models in the state-of-the-art speech recognition framework. This approach has advantages over both word-based systems and so-called end-to-end ASR systems that do not have separate acoustic and language models. We describe the necessary modifications needed to build an effective character-based ASR system using the Kaldi toolkit and evaluate the models based on words, statistical morphs, and characters for both Finnish and Arabic. The morph-based models yield the best recognition results for both well-resourced and lower-resourced tasks, but the character-based models are close to their performance in the lower-resource tasks, outperforming the word-based models. Character-based models are especially good at predicting novel word forms that were not seen in the training data. Using character-based neural network language models is both computationally efficient and provides a larger gain compared to the morph and word-based systems.
机译:我们在最先进的语音识别框架中研究基于字符的语言模型。这种方法相对于基于单词的系统和没有单独的声学和语言模型的所谓的端到端ASR系统均具有优势。我们描述了使用Kaldi工具包构建有效的基于字符的ASR系统所需的必要修改,并基于单词,统计词形和芬兰语和阿拉伯语字符对模型进行了评估。基于词素的模型对于资源丰富的资源和资源较少的任务都能产生最佳的识别结果,但是基于字符的模型在资源较少的任务中的性能接近其性能,优于基于单词的模型。基于字符的模型尤其擅长预测训练数据中未出现的新颖单词形式。与基于词素和单词的系统相比,使用基于字符的神经网络语言模型既计算效率高,又提供了更大的收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号