首页> 外文会议> >A phase generation method for speech reconstruction from spectral envelope and pitch intervals
【24h】

A phase generation method for speech reconstruction from spectral envelope and pitch intervals

机译:从频谱包络和音调间隔进行语音重构的相位产生方法

获取原文

摘要

We propose a new speech reconstruction method from spectral envelope and pitch intervals, which is applicable to the network side of a distributed speech recognition system as a play-back function. The spectral envelope of speech is represented as a set of Mel-frequency cepstral coefficients (MFCC) that is a well-known recognition parameter. First, a sinusoidal synthesis with a zero-phase model is used to obtain a pitch-based waveform. To enhance the naturalness of the speech, we replace the zero phase information with prestored linear and random codebooks. The ultimate phase information is determined depending on the energy ratio between linear and random components. Unlike the classic low bit rate speech coding, however, the energy ratio is estimated in the decoding stage from a time-frequency filter applied to the pitch-based synthesized signal. Thus, the phase information is not a feature parameter from the encoder side. The proposed phase generation method uses the knowledge that pitch variation is a main cause of the mixed characteristics in speech signals. An informal listening test verifies that the quality of the proposed method is much better than that of the synthetic quality.
机译:我们提出了一种来自光谱包络和间距间隔的新语音重建方法,其适用于分布式语音识别系统的网络侧作为播放功能。语音的光谱包络被表示为一组麦频谱系数(MFCC),其是众所周知的识别参数。首先,使用具有零相模型的正弦合成来获得基于俯仰的波形。为了增强语音的自然度,我们用预先存储的线性和随机码本更换零相位信息。根据线性和随机分量之间的能量比来确定最终相位信息。然而,与经典的低比特率语音编码不同,从应用于基于间距的合成信号的时频滤波器估计能量比在解码阶段中估计。因此,相位信息不是来自编码器侧的特征参数。所提出的阶段生成方法使用音调变化是语音信号中混合特性的主要原因。非正式听力测试验证了所提出的方法的质量远远优于合成质量的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号