Towards Understanding Attention-Based Speech Recognition Models

Qin Chu-Xiong; Qu Dan

首页> 外文期刊>Quality Control, Transactions >Towards Understanding Attention-Based Speech Recognition Models

【24h】

Towards Understanding Attention-Based Speech Recognition Models

机译：了解基于关注的语音识别模型

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although the attention-based speech recognition has achieved promising performances, the specific explanation of the intermediate representations remains a black box theory. In this paper, we use the method to visually show and explain continuous encoder outputs. We propose a human-intervened force alignment method to obtain labels for t-distributed stochastic neighbor embedding (t-SNE), and use them to better understand the attention mechanism and the recurrent representations. In addition, we combine t-SNE and canonical correlation analysis (CCA) to analyze the training dynamics of phones in the attention-based model. Experiments are carried on TIMIT and WSJ respectively. The aligned embeddings of the encoder outputs could form sequence manifolds of the ground truth labels. Figures of t-SNE embeddings visually show what representations the encoder shaped into and how the attention mechanism works for the speech recognition. The comparisons between different models, different layers, and different lengths of the utterance show that manifolds are clearer in the shape when outputs are from the deeper layer of the encoder, the shorter utterance, and models with better performances. We also observe that the same symbols from different utterances tend to gather at similar positions, which proves the consistency of our method. Further comparisons are taken between different epochs of the model using t-SNE and CCA. The results show that both the plosive and the nasal/flap phones converge quickly, while the long vowel phone converge slowly.

机译：虽然基于注意力的语音识别已经取得了有希望的表现，但是中间陈述的具体解释仍然是黑匣子理论。在本文中，我们使用该方法在视觉上显示并解释连续的编码器输出。我们提出了一种人间干预力对准方法，以获得用于T分布式随机邻居嵌入（T-SNE）的标签，并使用它们以更好地了解注意力机制和经常性表示。此外，我们结合了T-SNE和规范相关分析（CCA）来分析基于注意的模型中手机的训练动力学。实验分别在Timit和WSJ上进行。编码器输出的对齐嵌入可以形成地面真理标签的序列歧管。 T-SNE Embeddings的数据在视觉上显示编码器形状的陈述以及语音识别的注意力机制如何。不同型号，不同层和不同长度的话语之间的比较表明，当输出来自编码器的更深层，较短的话语和具有更好性能的模型时，歧管的形状更加清晰。我们还观察到来自不同话语的相同符号倾向于聚集在类似的位置，这证明了我们方法的一致性。使用T-SNE和CCA在模型的不同时期之间采集进一步的比较。结果表明，涂层和鼻腔/翼片手机都会迅速收敛，而长元音手机慢慢收敛。

著录项

来源
《Quality Control, Transactions》 |2020年第2020期|24358-24369|共12页
作者
Qin Chu-Xiong; Qu Dan;
展开▼
作者单位

PLA Strateg Support Force Informat Engn Univ Dept Informat & Syst Engn Zhengzhou 450001 Peoples R China;

PLA Strateg Support Force Informat Engn Univ Dept Informat & Syst Engn Zhengzhou 450001 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Attention-based model; t-distributed stochastic neighbor embedding; canonical correlation analysis;

机译：基于注意的模型;T分布式随机邻居嵌入;规范相关分析;

相似文献

外文文献
中文文献
专利

1. Attention-Based Convolution Skip Bidirectional Long Short-Term Memory Network for Speech Emotion Recognition [J] . Huiyun Zhang, Heming Huang, Henry Han Quality Control, Transactions . 2021,第1期

机译：基于注意力的卷积跳过双向长期短期记忆网络，用于语音情感识别
2. A novel dual attention-based BLSTM with hybrid features in speech emotion recognition [J] . Qiupu Chen, Guimin Huang Engineering Applications of Artificial Intelligence . 2021,第Juna期

机译：一种基于新的双重关注的BLSTM，语音情感识别中的混合特征
3. Speech Emotion Recognition Using 3D Convolutions and Attention-Based Sliding Recurrent Networks With Auditory Front-Ends [J] . Peng Zhichao, Li Xingfeng, Zhu Zhi, Quality Control, Transactions . 2020,第期

机译：使用3D卷积和基于关注的滑动反复网络的语音情感识别，具有听觉前端
4. Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition [C] . Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Spoken Language Technology Workshop . 2021

机译：利用基于关注的模型，为端到端语音识别的增强内存
5. Signal processing strategies for better melody recognition and improved speech understanding in noise for cochlear implants. [D] . Kasturi, Kalyan S. 2006

机译：信号处理策略可更好地识别旋律，并改善人工耳蜗噪声中的语音理解能力。
6. An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records [O] . Luqi Li, Jie Zhao, Li Hou, 2019

机译：基于注意的深度学习模型用于电子病历临床命名实体识别
7. Towards Understanding Attention-Based Speech Recognition Models [O] . Chu-Xiong Qin, Dan Qu 2020

机译：了解基于关注的语音识别模型

Towards Understanding Attention-Based Speech Recognition Models

摘要

著录项

相似文献

相关主题

期刊订阅