Introspection for convolutional automatic speech recognition

机译：卷积自动语音识别的内省

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Artificial Neural Networks (ANNs) have experienced great success in the past few years. The increasing complexity of these models leads to less understanding about their decision processes. Therefore, introspection techniques have been proposed, mostly for images as input data. Patterns or relevant regions in images can be intuitively interpreted by a human observer. This is not the case for more complex data like speech recordings. In this work, we investigate the application of common introspection techniques from computer vision to an Automatic Speech Recognition (ASR) task. To this end, we use a model similar to image classification, which predicts letters from spectrograms. We show difficulties in applying image introspection to ASR. To tackle these problems, we propose normalized averaging of aligned inputs (NAvAI): a data-driven method to reveal learned patterns for prediction of specific classes. Our method integrates information from many data examples through local introspection techniques for Convolutional Neural Networks (CNNs). We demonstrate that our method provides better interpretability of letter-specific patterns than existing methods.

机译：人工神经网络（ANNS）在过去几年中取得了巨大的成功。这些模型的越来越复杂程度导致对他们的决策过程的理解不太了解。因此，已经提出了内省技术，主要用于图像作为输入数据。图像中的模式或相关区域可以通过人类观察者直观地解释。对于更复杂的数据，这不是语音记录的情况并非如此。在这项工作中，我们调查了常见的内省技术从计算机视觉到自动语音识别（ASR）任务的应用。为此，我们使用类似于图像分类的模型，该模型预测来自频谱图的字母。我们在将图像内省应用于ASR时，我们展示了困难。为了解决这些问题，我们提出了对齐输入的归一化平均值（Navai）：数据驱动方法，以揭示用于预测特定类的学习模式。我们的方法通过卷积神经网络（CNNS）的本地内省技术集成了许多数据示例的信息。我们证明我们的方法提供了比现有方法更好地解释信函的模式。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|xviii 386 p.|共13页
会议地点
作者
Andreas Krug; Sebastian Stober;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 23:27:06

相似文献

外文文献
中文文献
专利

1. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients [J] . Pawar Manju D., Kokate Rajendra D. Multimedia Tools and Applications . 2021,第10期

机译：基于神经网络的基于神经网络的自动语音情绪识别
2. Techniques for handling convolutional distortion with ‘missing data' automatic speech recognition [J] . Kalle J. Palomaki, Guy J. Brown, Jon P. Barker Speech Communication . 2004,第1a2期

机译：通过“丢失数据”自动语音识别处理卷积失真的技术
3. Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises [J] . Zhao Y. IEEE Transactions on Speech and Audio Proceeding . 2000,第3期

机译：用于加性和卷积噪声中自动语音识别的频域最大似然估计
4. Introspection for convolutional automatic speech recognition [C] . Andreas Krug, Sebastian Stober 1st EMNLP workshop blackboxNLP: analyzing and interpreting neural networks for NLP 2018 . 2018

机译：卷积自动语音识别的内省
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference [O] . Byeongwook Lee, Kwang-Hyun Cho -1

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割
7. Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions [O] . Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, 2020

机译：QuartZnet：具有1D时间通道可分离卷积的深度自动语音识别

Introspection for convolutional automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅