【24h】

Introspection for convolutional automatic speech recognition

机译:卷积自动语音识别的内省

获取原文
获取原文并翻译 | 示例

摘要

Artificial Neural Networks (ANNs) have experienced great success in the past few years. The increasing complexity of these models leads to less understanding about their decision processes. Therefore, introspection techniques have been proposed, mostly for images as input data. Patterns or relevant regions in images can be intuitively interpreted by a human observer. This is not the case for more complex data like speech recordings. In this work, we investigate the application of common introspection techniques from computer vision to an Automatic Speech Recognition (ASR) task. To this end, we use a model similar to image classification, which predicts letters from spectrograms. We show difficulties in applying image introspection to ASR. To tackle these problems, we propose normalized averaging of aligned inputs (NAvAI): a data-driven method to reveal learned patterns for prediction of specific classes. Our method integrates information from many data examples through local introspection techniques for Convolutional Neural Networks (CNNs). We demonstrate that our method provides better interpretability of letter-specific patterns than existing methods.
机译:在过去的几年中,人工神经网络(ANN)取得了巨大的成功。这些模型的复杂性不断提高,导致对其决策过程的了解减少。因此,已经提出了自省技术,主要用于图像作为输入数据。观察者可以直观地解释图像中的图案或相关区域。对于更复杂的数据(如语音记录)则不是这种情况。在这项工作中,我们调查了从计算机视觉到自动语音识别(ASR)任务的常见自省技术的应用。为此,我们使用类似于图像分类的模型,该模型可根据频谱图预测字母。我们展示了将图像内省应用于ASR的困难。为了解决这些问题,我们提出了对齐输入的标准化平均(NAvAI):一种数据驱动的方法,用于揭示学习的模式以预测特定类别。我们的方法通过卷积神经网络(CNN)的本地自省技术集成了来自许多数据示例的信息。我们证明,与现有方法相比,我们的方法可提供更好的字母特定模式的解释性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号