Introspection for convolutional automatic speech recognition

机译：卷积自动语音识别的内省

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Artificial Neural Networks (ANNs) have experienced great success in the past few years. The increasing complexity of these models leads to less understanding about their decision processes. Therefore, introspection techniques have been proposed, mostly for images as input data. Patterns or relevant regions in images can be intuitively interpreted by a human observer. This is not the case for more complex data like speech recordings. In this work, we investigate the application of common introspection techniques from computer vision to an Automatic Speech Recognition (ASR) task. To this end, we use a model similar to image classification, which predicts letters from spectrograms. We show difficulties in applying image introspection to ASR. To tackle these problems, we propose normalized averaging of aligned inputs (NAvAI): a data-driven method to reveal learned patterns for prediction of specific classes. Our method integrates information from many data examples through local introspection techniques for Convolutional Neural Networks (CNNs). We demonstrate that our method provides better interpretability of letter-specific patterns than existing methods.

机译：在过去的几年中，人工神经网络（ANN）取得了巨大的成功。这些模型的复杂性不断提高，导致对其决策过程的了解减少。因此，已经提出了自省技术，主要用于图像作为输入数据。观察者可以直观地解释图像中的图案或相关区域。对于更复杂的数据（如语音记录）则不是这种情况。在这项工作中，我们调查了从计算机视觉到自动语音识别（ASR）任务的常见自省技术的应用。为此，我们使用类似于图像分类的模型，该模型可根据频谱图预测字母。我们展示了将图像内省应用于ASR的困难。为了解决这些问题，我们提出了对齐输入的标准化平均（NAvAI）：一种数据驱动的方法，用于揭示学习的模式以预测特定类别。我们的方法通过卷积神经网络（CNN）的本地自省技术集成了来自许多数据示例的信息。我们证明，与现有方法相比，我们的方法可提供更好的字母特定模式的解释性。

著录项

来源
《1st EMNLP workshop blackboxNLP: analyzing and interpreting neural networks for NLP 2018》|2018年|187-199|共13页
会议地点 Brussels(BE)
作者
Andreas Krug; Sebastian Stober;
展开▼
作者单位

University of Potsdam, Research Focus Cognitive Sciences Karl-Liebknecht-Str. 24/25, 14476 Potsdam, Germany;

University of Potsdam, Research Focus Cognitive Sciences Karl-Liebknecht-Str. 24/25, 14476 Potsdam, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients [J] . Pawar Manju D., Kokate Rajendra D. Multimedia Tools and Applications . 2021,第10期

机译：基于神经网络的基于神经网络的自动语音情绪识别
2. Techniques for handling convolutional distortion with ‘missing data' automatic speech recognition [J] . Kalle J. Palomaki, Guy J. Brown, Jon P. Barker Speech Communication . 2004,第1a2期

机译：通过“丢失数据”自动语音识别处理卷积失真的技术
3. Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises [J] . Zhao Y. IEEE Transactions on Speech and Audio Proceeding . 2000,第3期

机译：用于加性和卷积噪声中自动语音识别的频域最大似然估计
4. Introspection for convolutional automatic speech recognition [C] . Andreas Krug, Sebastian Stober Conference on empirical methods in natural language processing . 2018

机译：卷积自动语音识别的内省
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference [O] . Byeongwook Lee, Kwang-Hyun Cho -1

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割
7. Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions [O] . Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, 2020

机译：QuartZnet：具有1D时间通道可分离卷积的深度自动语音识别

Introspection for convolutional automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅