首页> 外文会议>INTERSPEECH 2012 >Phone recognition in critical bands using sub-band temporal modulations
【24h】

Phone recognition in critical bands using sub-band temporal modulations

机译:使用子带时间调制的关键乐队中的电话识别

获取原文

摘要

This study investigates a multistream phone recognition system, which consists of 21 parrallel sub-systems, each covers two critical bands, and fused by a multi-layer perceptron (MLP) system. Within each band, speech information is encoded by the frequency-domain linear prediction (FDLP) feature, which characterizes the temporal modulation of subband envelope. Two experiments are conducted to determine the optimal parameters for speech features, the maximum temporal modulation F_m and the context window length T, followed by an experiment to evaluate the robustness of the fused system in noise. Results show that the phone accuracies of subsystems reach the maximum point at about 500-600ms; they keep increasing monotonically as the maximum frequency of temporal modulation changes from 4 to 40 Hz, where it saturates. Tests of the fused system in babble and subway noise at 15 dB SNR indicate that the multi-stream system is more robust to noise than the single-steam baseline system.
机译:本研究调查了多阵线电话识别系统,该系统由21个平行子系统组成,每个子系统包括两个临界频带,并由多层Perceptron(MLP)系统融合。在每个频带内,语音信息由频域线性预测(FDLP)特征进行编码,其表征子带包络的时间调制。进行两次实验以确定语音特征的最佳参数,最大时间调制F_M和上下文窗口长度T,然后是实验,以评估融合系统的噪声的鲁棒性。结果表明,子系统的电话精度达到约500-600ms的最大点;它们随着时间调制的最大频率从4到40 Hz而变化,它们会随着时间的最大频率而增加。在15 dB SNR中禁用和地铁噪声中融合系统的测试表明,多流系统对噪声比单蒸汽基线系统更强大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号