首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder
【24h】

ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder

机译:Orthro:与双解码器的非自动增加端到端语音翻译

获取原文

摘要

Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems. End-to-end (E2E) models based on the encoder-decoder architecture are more suitable for this goal than traditional cascaded systems, but their effectiveness regarding decoding speed has not been explored so far. Inspired by recent progress in non-autoregressive (NAR) methods in text-based translation, which generates target tokens in parallel by eliminating conditional dependencies, we study the problem of NAR decoding for E2E-ST. We propose a novel NAR E2E-ST framework, Orthros, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder. The latter is used for selecting better translation among various length candidates generated from the former, which dramatically improves the effectiveness of a large length beam with negligible overhead. We further investigate effective length prediction methods from speech inputs and the impact of vocabulary sizes. Experiments on four benchmarks show the effectiveness of the proposed method in improving inference speed while maintaining competitive translation quality compared to state-of-the-art AR E2E-ST systems.
机译:快速推断速度是真实世界部署语音翻译(ST)系统的重要目标。基于编码器 - 解码器架构的端到端(E2E)模型比传统的级联系统更适合于该目标,但到目前为止还没有探索其关于解码速度的效果。灵感来自最近在基于文本的翻译中的非自动增加(NAR)方法的进展,这通过消除条件依赖性并行生成目标令牌,我们研究了E2E-ST的NAR解码问题。我们提出了一种新颖的NAR E2E-ST框架,奥特罗斯,其中NAR和自回归(AR)解码器在共享语音编码器上接受过共同培训。后者用于选择从前者产生的各种长度候选者之间的更好的平移,这显着提高了具有可忽略的开销的大长度光束的有效性。我们进一步研究了语音输入的有效长度预测方法和词汇量的影响。四个基准测试的实验表明了提出方法提高推理速度的有效性,同时保持竞争翻译质量与最先进的AR E2E-ST系统相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号