首页> 外文期刊>IEEE transactions on audio, speech and language processing >Speech Analysis in a Model of the Central Auditory System
【24h】

Speech Analysis in a Model of the Central Auditory System

机译:中央听觉系统模型中的语音分析

获取原文
获取原文并翻译 | 示例

摘要

Recently, there is a significant increase in research interest in the area of biologically inspired systems, which, in the context of speech communications, attempt to learn from human''''s auditory perception and cognition capabilities so as to derive the knowledge and benefits currently unavailable in practice. One particular pursuit is to understand why the human auditory system generally performs with much more robustness than an engineering system, say a state-of-the-art automatic speech recognizer. In this study, we adopt a computational model of the mammalian central auditory system and develop a methodology to analyze and interpret its behavior for an enhanced understanding of its end product, which is a data-redundant, dimension-expanded representation of neural firing rates in the primary auditory cortex (A1). Our first approach is to reinterpret the well-known Mel-frequency cepstral coefficients (MFCCs) in the context of the auditory model. We then present a framework for interpreting the cortical response as a place-coding of speech information, and identify some key advantages of the model''''s dimension expansion. The framework consists of a model of “source”-invariance that predicts how speech information is encoded in a class-dependent manner, and a model of “environment”-invariance that predicts the noise-robustness of class-dependent signal-respondent neurons. The validity of these ideas are experimentally assessed under existing recognition framework by selecting features that demonstrate their effects and applying them in a conventional phoneme classification task. The results are quantitatively and qualitatively discussed, and our insights inspire future research on category-dependent features and speech classification using the auditory model.
机译:最近,人们对生物启发系统的研究兴趣显着增加,该系统在语音交流的背景下,试图从人的听觉感知和认知能力中学习,从而获得知识和利益。目前在实践中不可用。一个特别的追求是要理解为什么人类的听觉系统通常比工程系统具有更高的鲁棒性,例如最新的自动语音识别器。在这项研究中,我们采用了哺乳动物中央听觉系统的计算模型,并开发了一种方法来分析和解释其行为,以增强对其最终产品的了解,这是一种数据冗余的,维度扩展的神经放电率表示。初级听觉皮层(A1)。我们的第一种方法是在听觉模型的背景下重新解释众所周知的梅尔频率倒谱系数(MFCC)。然后,我们提出了一个框架,用于将皮层反应解释为语音信息的位置编码,并确定模型维数扩展的一些关键优势。该框架由一个“源”不变性模型和一个“环境”不变性模型组成,“模型”预测语音信息如何以类相关的方式编码,该模型“环境”不变性预测类相关的信号响应神经元的噪声鲁棒性。通过选择演示其效果的功能并将其应用到常规音素分类任务中,可以在现有的识别框架下通过实验评估这些想法的有效性。对结果进行了定量和定性的讨论,我们的见解激发了未来使用听觉模型进行基于类别的特征和语音分类的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号