首页> 外文期刊>IEEE transactions on audio, speech and language processing >Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model
【24h】

Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

机译:基于时变高斯源模型的最大似然估计的语音去混响

获取原文
获取原文并翻译 | 示例

摘要

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A time-varying Gaussian source model (TVGSM) is introduced as a model that represents the dynamic short time characteristics of nonreverberant speech segments, including the time and frequency structures of the speech spectrum. With this model, dereverberation of the speech signal is formulated as a maximum-likelihood (ML) problem based on multichannel linear prediction, in which the speech signal is recovered by transforming the observed signal into one that is probabilistically more like nonreverberant speech. We first present a general ML solution based on TVGSM, and derive several dereverberation algorithms based on various source models. Specifically, we present a source model consisting of a finite number of states, each of which is manifested by a short time speech spectrum, defined by a corresponding autocorrelation (AC) vector. The dereverberation algorithm based on this model involves a finite collection of spectral patterns that form a codebook. We confirm experimentally that both the time and frequency characteristics represented in the source models are very important for speech dereverberation, and that the prior knowledge represented by the codebook allows us to further improve the dereverberated speech quality. We also confirm that the quality of reverberant speech signals can be greatly improved in terms of the spectral shape and energy time-pattern distortions from simply a short speech signal using a speaker-independent codebook.
机译:由于房间中的声反射,在封闭空间中遥远地采集声信号通常会产生混响分量。当在诸如免提语音识别,电话会议和会议录音等应用中通过远距离麦克风获取信号时,通常需要语音去混响。本文提出了一种基于统计语音模型的语音去混响新方法。引入时变的高斯源模型(TVGSM)作为模型,该模型表示非混响语音段的动态短时特性,包括语音频谱的时间和频率结构。使用此模型,语音信号的去混响被公式化为基于多通道线性预测的最大似然(ML)问题,其中通过将观察到的信号转换为概率上更像非混响的语音来恢复语音信号。我们首先提出一种基于TVGSM的通用ML解决方案,并基于各种源模型得出几种去混响算法。具体来说,我们提出了一种源模型,该模型由有限数量的状态组成,每个状态都由短时语音频谱体现,并由相应的自相关(AC)向量定义。基于此模型的混响算法涉及形成码本的频谱图的有限集合。我们通过实验确认了源模型中表示的时间和频率特性对于语音去混响非常重要,并且由码本表示的先验知识使我们能够进一步改善去皮的语音质量。我们还证实,使用独立于说话者的码本,从短语音信号中得到的频谱形状和能量时间模式失真方面,可以大大提高混响语音信号的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号