首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Cascaded Encoders for Unifying Streaming and Non-Streaming ASR
【24h】

Cascaded Encoders for Unifying Streaming and Non-Streaming ASR

机译:级联编码器,用于统一流媒体和非流媒体ASR

获取原文

摘要

End-to-end (E2E) automatic speech recognition (ASR) models, by now, have shown competitive performance on several benchmarks. These models are structured to either operate in streaming or non-streaming mode. This work presents cascaded encoders for building a single E2E ASR model that can operate in both these modes simultaneously. The proposed model consists of streaming and non-streaming encoders. Input features are first processed by the streaming encoder; the non-streaming encoder operates exclusively on the output of the streaming encoder. A single decoder then learns to decode either using the output of the streaming or the non-streaming encoder. Results show that this model achieves similar word error rates (WER) as a standalone streaming model when operating in streaming mode, and obtains 10% – 27% relative improvement when operating in non-streaming mode. Our results also show that the proposed approach outperforms existing E2E two-pass models, especially on long-form speech.
机译:端到端(E2E)自动语音识别(ASR)模型,现在,在几个基准上显示了竞争性能。这些模型构造成在流或非流模式下操作。这项工作介绍了级联编码器,用于构建一个E2E ASR模型,可以同时在这两种模式下操作。所提出的模型包括流和非流式编码器。输入功能首先由流编码器处理;非流式编码器专门运行在流编码器的输出上。然后,单个解码器学习使用流或非流编码器的输出来解码解码。结果表明,该模型在流模式下运行时,该模型将类似的单词误差率(WER)作为独立流式流模型,并在非流模式下运行时获得10%-27%的相对改进。我们的研究结果还表明,该方法优于现有的E2E双通模型,特别是在长期言论中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号