首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech
【24h】

Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech

机译:非平稳浊音语音的最优分割瞬时基频估计

获取原文
获取原文并翻译 | 示例

摘要

In speech processing, the speech is often considered stationary within segments of 20-30 ms even though it is well known not to be true. In this paper, we take the nonstationarity of voiced speech into account by using a linear chirp model to describe the speech signal. We propose a maximum likelihood estimator of the fundamental frequency and chirp rate of this model, and show that it reaches the Cramer-Rao lower bound. Since the speech varies over time, a fixed segment length is not optimal, and we propose making a segmentation of the signal based on the maximum a posteriori criterion. Using this segmentation method, the segments are on average longer for the chirp model compared to the traditional harmonic model. For the signal under test, the average segment length is 24.4 and 17.1 ms for the chirp model and traditional harmonic model, respectively. This suggests a better fit of the chirp model than the harmonic model to the speech signal. The methods are based on an assumption of white Gaussian noise, and, therefore, two prewhitening filters are also proposed.
机译:在语音处理中,通常认为语音在20到30 ms的段内是固定的,即使众所周知它不是真的。在本文中,我们通过使用线性线性调频模型来描述语音信号,考虑了语音的非平稳性。我们提出了该模型的基本频率和线性调频率的最大似然估计,并表明它达到了Cramer-Rao下界。由于语音会随时间变化,因此固定的段长度不是最佳的,我们建议根据最大后验准则对信号进行分段。与传统的谐波模型相比,使用这种分段方法,线性调频模型的分段平均更长。对于被测信号,线性调频模型和传统谐波模型的平均段长度分别为24.4 ms和17.1 ms。这表明线性调频模型比谐波模型更适合语音信号。该方法基于白高斯噪声的假设,因此,还提出了两个预白化滤波器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号