首页> 外文学位 >Embedding perceptual linear prediction models in speech and audio coding.

【24h】

Embedding perceptual linear prediction models in speech and audio coding.

机译：在语音和音频编码中嵌入感知线性预测模型。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The application of perceptual models in speech and audio coding began receiving attention during the late nineteen seventies. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on relatively old concepts introduced by Schroeder and Atal in 1979. This dissertation studies a series of problems encountered in the application of new perceptual models in prediction-based speech and audio coding algorithms. The dissertation also explores different ways of integrating advanced human auditory models into fixed and variable bit rate standardized vocoders.;Specific problems addressed in this dissertation include: (1) the significance of auditory excitation patterns in speech analysis and synthesis, (2) the performance of perceptual loudness metric in variable bit rate speech coding, and (3) adaptive pole estimation algorithms for linear prediction (LP) analysis in cascade form. The investigation of the above problems resulted in the development of two new algorithms for use in wideband speech coding and in variable bit rate speech coders.;The first one is called the perceptually-motivated all-pole (PMAP) modeling algorithm. The PMAP algorithm is based on an approach for estimating the perceptually-relevant pole locations. The estimated perceptual poles are used to construct an all-pole filter for use in speech analysis. The proposed PMAP approach is compared against some of the existing perceptually-based linear prediction methods, i.e., the perceptual LP and the Warped LP. The PMAP modeling (1) provides a way to integrate psychoacoustic principles into LP by using auditory excitation pattern (AEP) matching, (2) enables estimation and perceptual ranking of the speech formants, and (3) provides an LP prediction residual with lower perceptual loudness. The computational profiling of the PMAP algorithm highlighted the modules that are computationally complex. In particular, the AEP-matching search contributed to the majority of the computational complexity. A fast-PMAP modeling that employs a block-form of AEP-matching was developed. By making use of the properties of the parametric spreading function and its energy-preserving smearing operation, the AEPs are recursively estimated. This recursive estimation of excitation patterns resulted in significant (over 50%) computational reduction. Experiments that compare the performance of the fast-PMAP algorithm relative to the original PMAP algorithm are included.;The second algorithm is called the perceptual-loudness (PL) based rate determination. Unlike the existing rate selection strategies that are based on a voice activity detector and energy thresholds, the proposed method employs a perceptual loudness measure. The enhanced variable rate codec is used as the test-bed for evaluating the performance of the PL-based rate selection strategy. Experimental results demonstrate that the proposed PL-based rate determination compares well against other energy-based rate selection techniques in terms of average bit rate and speech quality. A fast PL-based rate selection algorithm that employs an LP analysis-driven pre-filtering followed by partial loudness estimation is proposed.

机译：感知模型在语音和音频编码中的应用在70年代后期开始受到关注。即使在今天，在语音编码标准中仍能利用人耳掩蔽特性的方法很大程度上是基于Schroeder和Atal在1979年提出的相对较旧的概念。本论文研究了在预测中使用新感知模型时遇到的一系列问题。的语音和音频编码算法。论文还探讨了将高级人类听觉模型集成到固定和可变比特率标准声码器中的不同方法。论文所解决的具体问题包括：（1）听觉激励模式在语音分析和合成中的重要性，（2）性能可变比特率语音编码中的感知响度度量，以及（3）用于级联形式的线性预测（LP）分析的自适应极点估计算法。通过对上述问题的研究，开发出了两种用于宽带语音编码和可变比特率语音编码器的新算法。第一种是感知动机全极点（PMAP）建模算法。 PMAP算法基于一种估计与感知相关的极点位置的方法。估计的感知极点用于构造用于语音分析的全极点滤波器。将所提出的PMAP方法与一些现有的基于感知的线性预测方法（即感知LP和Warped LP）进行比较。 PMAP建模（1）提供了一种通过使用听觉激励模式（AEP）匹配将心理声学原理整合到LP中的方法，（2）可以对语音共振峰进行估计和感知排名，并且（3）提供具有较低感知性的LP预测残差响度PMAP算法的计算配置文件突出显示了计算复杂的模块。尤其是，AEP匹配搜索导致了大多数计算复杂性。开发了采用AEP匹配的块形式的快速PMAP建模。通过利用参数扩展函数的特性及其节能的拖尾操作，可以对AEP进行递归估计。激励模式的这种递归估计导致显着（超过50％）的计算量减少。包括比较fast-PMAP算法与原始PMAP算法性能的实验。第二种算法称为基于感知响度（PL）的速率确定。与基于语音活动检测器和能量阈值的现有速率选择策略不同，该方法采用了感知响度测量。增强的可变速率编解码器用作评估基于PL的速率选择策略性能的测试平台。实验结果表明，所提出的基于PL的速率确定在平均比特率和语音质量方面与其他基于能量的速率选择技术具有很好的比较。提出了一种基于PL的快速速率选择算法，该算法采用LP分析驱动的预滤波，然后进行局部响度估计。

著录项

作者
Atti, Venkatraman S.;
展开▼
作者单位

Arizona State University.;

展开▼
授予单位 Arizona State University.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2006
页码 149 p.
总页数 149
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Reversible Audio Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 Speech [J] . Akira NISHIMURA IEICE transactions on information and systems . 2016,第1期

机译：基于线性预测的可变误差扩展的分段音频和G.711语音的可逆音频数据隐藏
2. Perceptual Models for Speech, Audio, and Music Processing [J] . Jont B Allen, Wai-Yip Geoffrey Chan, Stephen Voran EURASIP journal on audio, speech, and music processing . 2007,第1期

机译：语音，音频和音乐处理的感知模型
3. A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids [J] . Jensen J., Heusdens R., Jensen S.H. IEEE Transactions on Speech and Audio Proceeding . 2004,第2期

机译：具有阻尼正弦曲线的语音和音频信号建模的感知子空间方法
4. Perceptual Time Varying Linear Prediction model for speech applications [C] . Gamliel O., Shallom I.D. IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP 2009 . 2009

机译：语音应用的感知时变线性预测模型
5. Linear prediction of temporal envelopes for speech and audio applications [D] . Athineos, Marios 2007

机译：语音和音频应用的时间包络的线性预测
6. The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech [O] . Raphaël Thézé, Anne-Lise Giraud, Pierre Mégevand 2020

机译：皮质振荡的阶段决定了自然化视听语言中视觉线索的感知命运
7. Reversible Audio Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 Speech [O] . Akira NISHIMURA 2016

机译：基于可变误差扩展的可逆音频数据隐藏分段音频和G.711语音的线性预测
8. The Use of a Two-Pole Linear Prediction Model in Speech Recognition, [R] . makhoul,john wolf, jared 1974

机译：双极线性预测模型在语音识别中的应用，

Embedding perceptual linear prediction models in speech and audio coding.

摘要

著录项

相似文献

相关主题

期刊订阅