首页> 外文期刊>Circuits, systems, and signal processing >Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music
【24h】

Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music

机译:使用源信息进行语音增强,以对带有背景音乐的语音进行音素识别

获取原文
获取原文并翻译 | 示例

摘要

This work explores the significance of source information for speech enhancement resulting in better phoneme recognition of speech with background music segments. Standard procedure for speech enhancement in noisy conditions involves sequential processing in terms of the temporal, spectral and perceptual methods. This work follows the same sequential processing but with the additional modification of studying the effect of source, particularly in the temporal and perceptual-based enhancement techniques for enhancing speech with background music segments. The source information is studied in terms of the epoch locations and epoch strength, obtained after passing the sum of the mean and standard deviation of the component envelopes computed across frequencies obtained using the single frequency filter (SFF), through a zero frequency filter (ZFF). This method of obtaining epoch locations and epoch strength will be termed as SFF-ZFF in this work. The enhanced segments are passed through a phoneme recognizer built using Gaussian mixture model-hidden Markov model (GMM-HMM), subspace Gaussian mixture model-hidden Markov model (SGMM-HMM) and deep neural network-hidden Markov model (DNN-HMM) system, where the models are trained on clean speech. The enhanced audio files show a better phone error rate than the degraded audio files, which means that performing enhancement in terms of the source information is significant for the speech with background music regions.
机译:这项工作探索了源信息对语音增强的重要性,从而可以更好地识别带有背景音乐片段的语音的音素。在嘈杂条件下进行语音增强的标准过程涉及在时间,频谱和感知方法方面的顺序处理。这项工作遵循相同的顺序处理,但是对研究源的影响进行了其他修改,尤其是在基于时间和基于感知的增强技术中,以背景音乐片段增强语音效果。在通过零频率滤波器(ZFF)传递使用单频滤波器(SFF)获得的跨频率计算出的分量包络的均值和标准偏差的总和之后,根据信元位置和信元强度来研究源信息。 )。这种获得历元位置和历元强度的方法在本工作中将称为SFF-ZFF。增强的片段通过使用高斯混合模型-隐马尔可夫模型(GMM-HMM),子空间高斯混合模型-隐马尔可夫模型(SGMM-HMM)和深度神经网络-隐马尔可夫模型(DNN-HMM)构建的音素识别器传递系统,以纯净的语音训练模型。与降级的音频文件相比,增强的音频文件显示出更好的电话错误率,这意味着对于具有背景音乐区域的语音,在源信息方面进行增强非常重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号