Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation

机译：基于层次的变换器的大上下文端到端ASR，具有大上下文知识蒸馏

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. Common E2E-ASR models have mainly focused on utterance-level processing in which each utterance is independently transcribed. On the other hand, large-context E2E-ASR models, which take into account long-range sequential contexts beyond utterance boundaries, well handle a sequence of utterances such as discourses and conversations. However, the transformer architecture, which has recently achieved state-of-the-art ASR performance among utterance-level ASR systems, has not yet been introduced into the large-context ASR systems. We can expect that the transformer architecture can be leveraged for effectively capturing not only input speech contexts but also long-range sequential contexts beyond utterance boundaries. Therefore, this paper proposes a hierarchical transformer-based large-context E2E-ASR model that combines the transformer architecture with hierarchical encoder-decoder based large-context modeling. In addition, in order to enable the proposed model to use long-range sequential contexts, we also propose a large-context knowledge distillation that distills the knowledge from a pre-trained large-context language model in the training phase. We evaluate the effectiveness of the proposed model and proposed training method on Japanese discourse ASR tasks.

机译：我们提出了一种基于知识蒸馏的新型大型内部端到端自动语音识别（E2E-ASR）模型及其有效的培训方法。普通的E2E-ASR模型主要集中在单个话语独立转录的话语级处理上。另一方面，大上下文E2E-ASR模型，它考虑了超出了话语边界的远程顺序上下文，很好地处理了一系列话语，例如话语和对话。然而，尚未引入最近在语言级ASR系统中实现最先进的ASR性能的变压器架构，尚未被引入到大上下文ASR系统中。我们可以预期可以利用变压器架构，以便有效地捕获输入语音上下文，而且可以在话语边界之外有效地捕获输入语音上下文。因此，本文提出了一种基于分层变换器的大上下文E2E-ASR模型，该模型将变压器架构与基于分层编码器解码器的大上下文建模相结合。另外，为了使提出的模型能够使用远程顺序上下文，我们还提出了一个大脑知识蒸馏，该蒸馏从训练阶段中从预先训练的大型语境语言模型中蒸馏出知识。我们评估拟议模型的有效性以及提出培训方法对日语话语ASR任务。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2021年|5879-5883|共5页
会议地点
作者
Ryo Masumura; Naoki Makishima; Mana Ihori; Akihiko Takashima; Tomohiro Tanaka; Shota Orihashi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Conferences; Signal processing; Acoustics; Task analysis; Speech processing; Context modeling;

机译：培训;会议;信号处理;声学;任务分析;语音处理;上下文建模;

相似文献

外文文献
中文文献
专利

1. Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification [J] . Gong Jibing, Teng Zhiyong, Teng Qi, Quality Control, Transactions . 2020,第期

机译：基于分层图形变换器的大型多标签文本分类的深度学习模型
2. Enhancing Transformer-based language models with commonsense representations for knowledge-driven machine comprehension [J] . Li Ronghan, Jiang Zejun, Wang Lifang, Knowledge-Based Systems . 2021,第MAYa23期

机译：通过用于知识驱动的机器理解的致致通知表示，增强基于变压器的语言模型
3. DAM: Transformer-based relation detection for Question Answering over Knowledge Base [J] . Chen Yongrui, Li Huiying Knowledge-Based Systems . 2020,第Auga9期

机译：大坝：基于变压器的关系检测，关于知识库的回答
4. Generalized Large-Context Language Models Based on Forward-Backward Hierarchical Recurrent Encoder-Decoder Models [C] . Ryo Masumura, Mana Ihori, Tomohiro Tanaka, IEEE Automatic Speech Recognition and Understanding Workshop . 2019

机译：基于前向后递归递归编码器-解码器模型的广义大上下文语言模型
5. Knowledge Distillation Circumvents Nonlinearity for Optical Convolutional Neural Networks [D] . Xiang, Jinlin. 2021

机译：知识蒸馏避免光学卷积神经网络的非线性
6. Facile Preparation of a Superhydrophobic iPP Microporous Membrane with Micron-Submicron Hierarchical Structures for Membrane Distillation [O] . Cuicui Hu, Zhensheng Yang, Qichao Sun, 2020

机译：微米级-亚微米级膜结构的超疏水iPP微孔膜的简便制备
7. TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition [O] . Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, 2021

机译：Tutornet：对端到端语音识别的灵活知识蒸馏
8. Integrated End-to-End Radar Signal and Data Processing with Over- Arching Knowledge-Based Control [R] . Capraro, G. T. 2007

机译：集成的端到端雷达信号和数据处理与基于知识的控制

Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation

摘要

著录项

相似文献

相关主题

期刊订阅