首页> 外文期刊>IEEE Journal on Selected Areas in Communications >Time envelope vocoder, a new LP based coding strategy for use at bit rates of 2.4 kb/s and below
【24h】

Time envelope vocoder, a new LP based coding strategy for use at bit rates of 2.4 kb/s and below

机译:时间包络声码器,一种新的基于LP的编码策略,以2.4 kb / s及以下的比特率使用

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a linear prediction (LP) based vocoder employing a novel technique which ensures smooth evolution of the synthetic speech waveform. In this coder, speech waveforms are considered as having a 'time envelope', the shape of which contains important perceptual information. By ensuring that the time envelope of the synthetic speech closely matches that of the original, natural sounding synthetic speech can be produced. Envelope matching may be achieved using a new, low complexity analysis by synthesis loop at the decoder which determines the synthetic excitation energy. The advantage over more traditional linear prediction vocoders is that the amplitude time envelope is preserved in addition to the spectral envelope, allowing the rapid amplitude transitions associated with onsets to be retained in the synthetic speech, resulting in a more intelligible output. Simply controlling the overall energy of the synthetic excitation is not sufficient to accurately control the synthetic speech energy. Small changes in linear prediction or pitch parameters due to quantization, for example, can cause variations in the synthetic speech amplitude, especially from one pitch cycle to the next resulting in noisy synthetic speech. The inclusion of an analysis by synthesis loop at the decoder ensures that the synthetic speech amplitude is independent of variations in the pitch period and LP parameters. This paper presents a complete vocoder scheme using time envelope matching, including details of techniques such as parameter interpolation, excitation pulse shaping and pitch tracking which have proven necessary to produce natural sounding synthetic speech at 2.4 kb/s and below.
机译:本文介绍了一种基于线性预测(LP)的声码器,它采用了一种新颖的技术,可确保合成语音波形的平稳演化。在此编码器中,语音波形被认为具有“时间包络”,其形状包含重要的感知信息。通过确保合成语音的时间包络与原始时间紧密匹配,可以产生听起来自然的合成语音。可以使用新的,低复杂度的分析,通过解码器上的合成环路确定新的激励能量,从而实现包络匹配。与更传统的线性预测声码器相比,其优势在于,除了频谱包络外,还保留了幅度时间包络,从而可以将与开始点相关的快速幅度转换保留在合成语音中,从而产生更加清晰的输出。仅控制合成激励的总能量不足以精确地控制合成语音能量。例如,由于量化导致的线性预测或音调参数的微小变化会导致合成语音幅度的变化,特别是从一个音调周期到下一音调周期变化时,会导致合成语音嘈杂。在解码器中包含通过合成环路进行的分析可确保合成语音幅度与基音周期和LP参数的变化无关。本文提出了一种使用时域包络匹配的完整声码器方案,包括诸如参数插值,激励脉冲整形和音调跟踪之类的技术细节,这些技术已被证明对于产生2.4 kb / s及以下的自然发声合成语音是必需的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号