首页> 外文期刊>IEEE transactions on audio, speech and language processing >Low bit-rate voice compression based on frequency domain interpolative techniques
【24h】

Low bit-rate voice compression based on frequency domain interpolative techniques

机译:基于频域内插技术的低比特率语音压缩

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an approach, referred to as frequency domain interpolation (FDI), for achieving high-quality speech at low bit-rates (4 kb/s and below) within reasonable complexity and delay. FDI methods, like the prototype waveform interpolation (PWI) methods, derive a prototype waveform (PW) at regular intervals of time. But, unlike PWI, there is no separation into a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW) component. Instead, the PW is encoded after gain normalization in magnitude-phase form. The magnitude is modeled as a sum of mean and deviation values in multiple frequency bands and this model is quantized using switched backward adaptive VQ techniques. The phase information is represented as a composite vector of PW correlations in multiple frequency bands and an overall voicing measure. This information is quantized using a VQ at the encoder. At the decoder, a phase model is employed that uses the received phase (and magnitude) information to reproduce PWs with the correct periodicity and evolutionary characteristics. Speech is synthesized by interpolating the reconstructed PWs after gain adjustment and filtering it using the short-term predictor and a postfilter. The design of a 4-kb/s and a 2.4-kb/s FDI codec are presented in this paper and their performance is characterized in terms of delay, complexity, and subjective voice quality. The results confirm that FDI techniques have the potential for delivering high-quality speech at low bit-rates in a cost-effective manner.
机译:本文提出了一种称为频域内插(FDI)的方法,该方法可在合理的复杂度和延迟范围内以低比特率(4 kb / s及以下)实现高质量语音。像原型波形插值(PWI)方法一样,FDI方法会以固定的时间间隔导出原型波形(PW)。但是,与PWI不同,没有分离为缓慢发展的波形(SEW)和快速发展的波形(REW)分量。取而代之的是,在增益归一化之后以幅度相位形式对PW进行编码。将幅度建模为多个频带中平均值和偏差值的总和,并使用切换后向自适应VQ技术对该模型进行量化。相位信息表示为多个频带中PW相关性的复合矢量和总体发声方式。该信息在编码器中使用VQ进行量化。在解码器处,采用相位模型,该模型使用接收到的相位(和幅度)信息来再现具有正确的周期性和进化特征的PW。通过在增益调整后对重构的PW进行插值并使用短期预测器和后置滤波器对其进行滤波,可以合成语音。本文介绍了一种4-kb / s和2.4-kb / s FDI编解码器的设计,并根据延迟,复杂性和主观语音质量来表征其性能。结果证实,外国直接投资技术具有以低成本高效率传送低比特率高质量语音的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号