首页> 外文期刊>Advances in Experimental Medicine and Biology >Do We Need STRFs for Cocktail Parties? On the Relevance of Physiologically Motivated Features for Human Speech Perception Derived from Automatic Speech Recognition.
【24h】

Do We Need STRFs for Cocktail Parties? On the Relevance of Physiologically Motivated Features for Human Speech Perception Derived from Automatic Speech Recognition.

机译:鸡尾酒会需要STRF吗?生理动机特征与自动语音识别衍生的人类语音感知的相关性。

获取原文
获取原文并翻译 | 示例
           

摘要

Complex auditory features such as spectro-temporal receptive fields (STRFs) derived from the cortical auditory neurons appear to be advantageous in sound processing. However, their physiological and functional relevance is still unclear. To assess the utility of such feature processing for speech reception in noise, automatic speech recognition (ASR) performance using feature sets obtained from physiological and/or psychoacoustical data and models is compared to human performance. Time-frequency representations with a nonlinear compression are compared with standard features such as mel-scaled spectrograms. Both alternatives serve as an input to model estimators that infer spectro-temporal filters (and subsequent nonlinearity) from physiological measurements in auditory brain areas of zebra finches. Alternatively, a filter bank of 2-dimensional Gabor functions is employed, which covers a wide range of modulation frequencies in the time and frequency domain. The results indicate a clear increase in ASR robustness using complex features (modeled by Gabor functions), while the benefit from physiologically derived STRFs is limited. In all cases, the use of power-normalized spectral representations increases performance, indicating that substantial dynamic compression is advantageous for level-independent pattern recognition. The methods employed may help physiologists to look for more relevant STRFs and to better understand specific differences in estimated STRFs.
机译:复杂的听觉特征,例如从皮层听觉神经元产生的光谱时空感受场(STRF),似乎在声音处理中具有优势。但是,它们的生理和功能相关性仍不清楚。为了评估这种特征处理在噪声中接收语音的效用,将使用从生理和/或心理声学数据和模型获得的特征集的自动语音识别(ASR)性能与人类性能进行比较。将具有非线性压缩的时频表示形式与标准特征(例如梅尔缩放谱图)进行比较。两种选择都可作为模型估计器的输入,这些模型估计器从斑马雀科动物的听觉大脑区域的生理测量值中推断出光谱时间滤波器(以及随后的非线性)。可替代地,使用二维Gabor函数的滤波器组,其覆盖时域和频域中的广泛的调制频率。结果表明,使用复杂功能(由Gabor函数建模)可以使ASR鲁棒性明显提高,而从生理学获得的STRF的益处却有限。在所有情况下,使用功率归一化的频谱表示均会提高性能,这表明大量的动态压缩对于独立于级别的模式识别是有利的。所采用的方法可以帮助生理学家寻找更相关的STRF,并更好地了解估计的STRF中的特定差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号