首页> 外文会议>International Symposium on Chinese Spoken Language Processing >Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature
【24h】

Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature

机译:基于基于相位和幅度的特征的多目标学习的远距离语音识别

获取原文

摘要

Deep neural network for speech enhancement is an increasingly interesting topic. In this paper, we propose a multi-objective learning method to using the amplitude and phase information for reverberant speech recognition. In previous studies, some researches found that phase information is important for human speech recognition, but phase information is ignored for almost front-end of speech recognition. To address this problem, this paper proposes using a multi-objective neural network method to optimize speech enhancement and feature enhancement simultaneously. For phase information, Modied Group Delay Cepstral Coefcients (MGDCC) and Phase Domain Source-Filter separation based Vocal Tract (PBSFVT) are used. In this paper, we use the data set of Reverb Challenge 2014 to evaluate proposed method on distant-talking speech recognition. The Word Error Rate (WER) of speech recognition was reduced from 26.57% of traditional deep neural work based dereverberation using magnitude feature, to 23.34% of the proposed method and the relative error reduction rate is 12.15%.
机译:用于语音增强的深度神经网络是一个越来越有趣的话题。本文提出了一种利用幅度和相位信息进行混响语音识别的多目标学习方法。在先前的研究中,一些研究发现,相位信息对于人类语音识别非常重要,但是对于语音识别的几乎前端,相位信息却被忽略了。为了解决这个问题,本文提出了一种使用多目标神经网络的方法来同时优化语音增强和特征增强。对于相位信息,使用了改进的群时延倒谱系数(MGDCC)和基于相域源滤波器分离的人声道(PBSFVT)。在本文中,我们使用Reverb Challenge 2014的数据集来评估所提出的远距离语音识别方法。语音识别的单词错误率(WER)从传统的基于深度神经工作的使用幅度特征的去混响的26.57%降低到所提出方法的23.34%,相对错误减少率为12.15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号