Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature

机译：基于基于相位和幅度的特征的多目标学习的远距离语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural network for speech enhancement is an increasingly interesting topic. In this paper, we propose a multi-objective learning method to using the amplitude and phase information for reverberant speech recognition. In previous studies, some researches found that phase information is important for human speech recognition, but phase information is ignored for almost front-end of speech recognition. To address this problem, this paper proposes using a multi-objective neural network method to optimize speech enhancement and feature enhancement simultaneously. For phase information, Modied Group Delay Cepstral Coefcients (MGDCC) and Phase Domain Source-Filter separation based Vocal Tract (PBSFVT) are used. In this paper, we use the data set of Reverb Challenge 2014 to evaluate proposed method on distant-talking speech recognition. The Word Error Rate (WER) of speech recognition was reduced from 26.57% of traditional deep neural work based dereverberation using magnitude feature, to 23.34% of the proposed method and the relative error reduction rate is 12.15%.

机译：用于语音增强的深度神经网络是一个越来越有趣的话题。本文提出了一种利用幅度和相位信息进行混响语音识别的多目标学习方法。在先前的研究中，一些研究发现，相位信息对于人类语音识别非常重要，但是对于语音识别的几乎前端，相位信息却被忽略了。为了解决这个问题，本文提出了一种使用多目标神经网络的方法来同时优化语音增强和特征增强。对于相位信息，使用了改进的群时延倒谱系数（MGDCC）和基于相域源滤波器分离的人声道（PBSFVT）。在本文中，我们使用Reverb Challenge 2014的数据集来评估所提出的远距离语音识别方法。语音识别的单词错误率（WER）从传统的基于深度神经工作的使用幅度特征的去混响的26.57％降低到所提出方法的23.34％，相对错误减少率为12.15％。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2018年|394-398|共5页
会议地点
作者
Dongbo Li; Longbiao Wang; Jianwu Dang; Meng Ge; Haotian Guan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech recognition; Speech enhancement; Task analysis; Delays; Training; Feature extraction; Neural networks;

机译：语音识别;语音增强;任务分析;延迟;训练;特征提取;神经网络;

相似文献

外文文献
中文文献
专利

1. Combination of bottleneck feature extraction and dereverberation for distant-talking speech recognition [J] . Ren Bo, Wang Longbiao, Lu Liang, Multimedia Tools and Applications . 2016,第9期

机译：瓶颈特征提取与去混响相结合，用于远距离语音识别
2. Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm [J] . Longbiao WANG, Norihide KITAOKA, Seiichi NAKAGAWA IEICE Transactions on Information and Systems . 2011,第3期

机译：基于多通道LMS算法的谱相减的远距离语音识别
3. Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm [J] . Longbiao WANG, Norihide KITAOKA, Seiichi NAKAGAWA IEICE transactions on information and systems . 2011,第3期

机译：基于多通道LMS算法的频谱减法的远程谈话语音识别
4. Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature [C] . Dongbo Li, Longbiao Wang, Jianwu Dang, International Symposium on Chinese Spoken Language Processing . 2018

机译：基于多目标学习的遥控语音识别使用相位和基于级别的功能
5. Transfer Learning Approaches for Feature Denoising and Low-Resource Speech Recognition [D] . Bagchi, Deblin. 2020

机译：转移学习方法，具有特征去噪和低资源语音识别
6. Gradient-Based Multi-Objective Feature Selection for Gait Mode Recognition of Transfemoral Amputees [O] . Gholamreza Khademi, Hanieh Mohammadi, Dan Simon 2019

机译：基于梯度的多目标特征选择在股骨截肢者步态识别中的应用
7. MODEL-BASED DEREVERBERATION IN THE LOGMELSPEC DOMAIN FOR ROBUST DISTANT-TALKING SPEECH RECOGNITION [O] . Armin Sehr, Walter Kellermann 2011

机译：LOGMELSPEC域中基于模型的去耦，用于鲁棒远程语音识别

Distant-talking Speech Recognition Based on Multi-objective Learning using Phase and Magnitude-based Feature

摘要

著录项

相似文献

相关主题

期刊订阅