首页> 外文学位 >Analysis-by-synthesis multimode harmonic speech coding at low bit rate.
【24h】

Analysis-by-synthesis multimode harmonic speech coding at low bit rate.

机译:低比特率的综合分析多模谐波语音编码。

获取原文
获取原文并翻译 | 示例

摘要

To satisfy a growing demand for new speech communication technologies in the past decade, speech coders capable of delivering toll-quality speech at lower and lower rates have been developed at an astonishing pace. Many promising low-rate coding schemes have been proposed in recent years to bridge the gap between toll quality waveform coding at rates above 6.3 kb/s and moderate quality vocoding at rates around 2.4 kb/s. Specifically, at present, there is a surge of research and commercial interest to develop high quality speech coders operating at bit rates of 4 kb/s and below.; The main objective of this dissertation is to develop new techniques that allow a harmonic (sinusoidal) speech coder to overcome various limitations in order to achieve high quality at a rate of 4 kb/s or below.; In harmonic coders, speech is traditionally classified as voiced or unvoiced. Harmonic coding of speech uses the harmonic model for voiced speech and the noise model for stationary unvoiced speech. However, experimental evidence shows that the poor representation of the transitory speech segments results in a significant degradation in the reconstructed speech quality. In order to improve the speech model accuracy in non-stationary speech segments, this dissertation introduces a novel frequency domain speech model for transition coding. This model represents time-domain significant events (pulses) by using a generalized sinusoidal model. A closed-loop analysis-by-synthesis parameter estimation procedure was devised for the new transition speech model.; In order to improve the accuracy and robustness of parameter estimation in harmonic coders, we introduced a novel time domain analysis-by-synthesis parameter estimation method in the harmonic coding framework. In this method, we propose the use of a nonlinear time scale modification technique for overcoming the waveform matching obstacle in harmonic coders and thereby achieving time-domain closed-loop parameter estimation. The effectiveness of this method is demonstrated by a specific algorithm for pitch and class estimation.; In this research, we also conducted a comprehensive study of the quantization issue for the variable-dimension harmonic magnitude vector. We incorporated speech perceptual weighting with the non-square transform vector quantization (NSTVQ) for harmonic spectral magnitudes quantization. We showed that the weighted NSTVQ is a generalization of all existing linear transformbased variable-dimension vector quantization schemes. Within the framework provided by the weighted NSTVQ, two implementation schemes were proposed and compared with the respect to the computational complexity. We demonstrated that the weighted NSTVQ system has the ability to trade performance for complexity and memory storage by selecting different transforms and the length of fixed-dimension vectors.; In order to demonstrate the viability of the new techniques studied in this research, we designed a 4 kb/s analysis-by-synthesis multimode harmonic coder (AbS-MHC) employing the proposed frequency domain transition speech modeling technique, the proposed analysis-by-synthesis parameter estimation technique for pitch/class estimation, and the new variable-dimension vector quantization technique. The resulting AbS-MHC coder at a rate of 4 kb/s achieves a perceptual quality very similar to the G.723.1 coder at 6.3 kb/s as indicated by subjective test results.
机译:为了满足过去十年对新语音通信技术不断增长的需求,已经以惊人的速度开发了能够以越来越低的速率传递收费质量的语音的语音编码器。近年来,已经提出了许多有希望的低速率编码方案,以弥合以高于6.3 kb / s的速率的话音质量波形编码和以约2.4 kb / s的速率的中等质量声码之间的差距。具体地说,目前,开发以4kb / s及以下的比特率工作的高质量语音编码器的研究和商业兴趣激增。本文的主要目的是开发新技术,使谐波(正弦)语音编码器能够克服各种限制,从而以4 kb / s或更低的速率实现高质量。在谐波编码器中,传统上将语音分类为有声或无声。语音的谐波编码对有声语音使用谐波模型,对平稳的无声语音使用噪声模型。但是,实验证据表明,短暂语音段的较差表示会导致重构语音质量的显着下降。为了提高非平稳语音段中语音模型的准确性,本文介绍了一种用于过渡编码的新型频域语音模型。该模型通过使用广义正弦模型来表示时域重要事件(脉冲)。针对新的过渡语音模型设计了一种闭环综合分析参数估计程序。为了提高谐波编码器中参数估计的准确性和鲁棒性,我们在谐波编码框架中引入了一种新的时域综合分析参数估计方法。在这种方法中,我们提出使用非线性时标修改技术来克服谐波编码器中的波形匹配障碍物,从而实现时域闭环参数估计。该方法的有效性由用于音调和类估计的特定算法证明。在这项研究中,我们还对变维谐波幅度矢量的量化问题进行了全面研究。我们将语音感知加权与非平方变换矢量量化(NSTVQ)结合在一起,以进行谐波频谱幅度量化。我们证明了加权NSTVQ是所有现有的基于线性变换的可变维矢量量化方案的推广。在加权NSTVQ提供的框架内,提出了两种实现方案,并就计算复杂度进行了比较。我们证明了加权的NSTVQ系统能够通过选择不同的变换和固定维向量的长度来将性能换为复杂性和内存存储。为了证明这项研究中新技术的可行性,我们设计了一种4 kb / s的综合分析多模谐波编码器(AbS-MHC),采用了拟议的频域过渡语音建模技术,即音高/类估计的合成参数估计技术,以及新的变维矢量量化技术。主观测试结果表明,以4 kb / s的速率生成的AbS-MHC编码器达到了与G.723.1编码器6.3 kb / s相似的感知质量。

著录项

  • 作者

    Li, Chunyan.;

  • 作者单位

    University of California, Santa Barbara.;

  • 授予单位 University of California, Santa Barbara.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2000
  • 页码 142 p.
  • 总页数 142
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号