首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation
【24h】

Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation

机译:基于层次的变换器的大上下文端到端ASR,具有大上下文知识蒸馏

获取原文

摘要

We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. Common E2E-ASR models have mainly focused on utterance-level processing in which each utterance is independently transcribed. On the other hand, large-context E2E-ASR models, which take into account long-range sequential contexts beyond utterance boundaries, well handle a sequence of utterances such as discourses and conversations. However, the transformer architecture, which has recently achieved state-of-the-art ASR performance among utterance-level ASR systems, has not yet been introduced into the large-context ASR systems. We can expect that the transformer architecture can be leveraged for effectively capturing not only input speech contexts but also long-range sequential contexts beyond utterance boundaries. Therefore, this paper proposes a hierarchical transformer-based large-context E2E-ASR model that combines the transformer architecture with hierarchical encoder-decoder based large-context modeling. In addition, in order to enable the proposed model to use long-range sequential contexts, we also propose a large-context knowledge distillation that distills the knowledge from a pre-trained large-context language model in the training phase. We evaluate the effectiveness of the proposed model and proposed training method on Japanese discourse ASR tasks.
机译:我们提出了一种基于知识蒸馏的新型大型内部端到端自动语音识别(E2E-ASR)模型及其有效的培训方法。普通的E2E-ASR模型主要集中在单个话语独立转录的话语级处理上。另一方面,大上下文E2E-ASR模型,它考虑了超出了话语边界的远程顺序上下文,很好地处理了一系列话语,例如话语和对话。然而,尚未引入最近在语言级ASR系统中实现最先进的ASR性能的变压器架构,尚未被引入到大上下文ASR系统中。我们可以预期可以利用变压器架构,以便有效地捕获输入语音上下文,而且可以在话语边界之外有效地捕获输入语音上下文。因此,本文提出了一种基于分层变换器的大上下文E2E-ASR模型,该模型将变压器架构与基于分层编码器解码器的大上下文建模相结合。另外,为了使提出的模型能够使用远程顺序上下文,我们还提出了一个大脑知识蒸馏,该蒸馏从训练阶段中从预先训练的大型语境语言模型中蒸馏出知识。我们评估拟议模型的有效性以及提出培训方法对日语话语ASR任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号