首页> 外文OA文献 >Embedded Optimization Algorithms for Perceptual Enhancement of Audio Signals
【2h】

Embedded Optimization Algorithms for Perceptual Enhancement of Audio Signals

机译:用于音频信号感知增强的嵌入式优化算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This thesis investigates the design and evaluation of an embedded optimization framework for the perceptual enhancement of audio signals which are degraded by linear and/or nonlinear distortion. In general, audio signal enhancement has the goal to improve the perceived audio quality, speech intelligibility, or another desired perceptual attribute of the distorted audio signal by applying a real-time digital signal processing algorithm. In the designed embedded optimization framework, the audio signal enhancement problem under consideration is formulated and solved as a per-frame numerical optimization problem, allowing to compute the enhanced audio signal frame that is optimal according to a desired perceptual attribute. The first stage of the embedded optimization framework consists in the formulation of the per-frame optimization problem aimed at maximally enhancing the desired perceptual attribute, by explicitly incorporating a suitable model of human sound perception. The second stage of the embedded optimization framework consists in the on-line solution of the formulated per-frame optimization problem, by using a fast and reliable optimization method that exploits the inherent structure of the optimization problem. This embedded optimization framework is applied to four commonly encountered and challenging audio signal enhancement problems, namely hard clipping precompensation, loudspeaker precompensation, declipping and multi-microphone dereverberation. The first part of this thesis focuses on precompensation algorithms, in which the audio signal enhancement operation is applied before the distortion process affects the audio signal. More specifically, the problems of hard clipping precompensation and loudspeaker precompensation are tackled in the embedded optimization framework. In the context of hard clipping precompensation, an objective function reflecting the perceptible nonlinear hard clipping distortion is constructed by including frequency weights based on the instantaneous masking threshold, which is computed on a frame-by frame basis by applying a perceptual model. The resulting per-frame convex quadratic optimization problems are solved efficiently using an optimal projected gradient method, for which theoretical complexity bounds are derived. Moreover, a fixed-point hardware implementation of this optimal projected gradient method on a field programmable gate array (FPGA) shows the algorithm to be capable to run in real time and without perceptible audio quality loss on a small and portable audio device. In the context of loudspeaker precompensation, an objective function reflecting the perceptible combined linear and nonlinear loudspeaker distortion is constructed in a similar fashion as for hard clipping precompensation. The loudspeaker is modeled using a Hammerstein loudspeaker model, i.e. a cascade of a memoryless nonlinearity and a linear FIR filter. The resulting per-frame nonconvex optimization problems are solved efficiently using gradient optimization methods which exploit knowledge on the invertibility and the smoothness of the memoryless nonlinearity in the Hammerstein loudspeaker model. From objective and subjective evaluation experiments, it is concluded with statistical significance that the embedded optimization algorithms for hard clipping and loudspeaker precompensation improve the resulting audio quality when compared to standard precompensation algorithms.The second part of this thesis focuses on recovery algorithms, in which the audio signal enhancement operation is applied after the distortion process affects the audio signal. More specifically, the problems of declipping and multi-microphone dereverberation are tackled in the embedded optimization framework. Declipping is formulated as a sparse signal recovery problem where the recovery is performed by solving a per-frame l1-norm minimization problem, which includes frequency weights based on the instantaneous masking threshold. As a result, the declipping algorithm is focused on maximizing the perceived audio quality instead of the physical signal reconstruction quality of the declipped audio signal. Comparative objective and subjective evaluation experiments reveal with statistical significance that the proposed embedded optimization declipping algorithm improves the resulting audio quality compared to existing declipping algorithms. Multi-microphone dereverberation is formulated as a nonconvex optimization problem, allowing for the joint estimation of the clean audio signal and the room acoustics model parameters. It is shown that the nonconvex optimization problem can be smoothed by including regularization terms based on a statistical late reverberation model and a sparsity prior for the clean audio signal, which is demonstrated to improve the dereverberation performance.
机译:本文研究了一种嵌入式优化框架的设计和评估,该框架用于对线性和/或非线性失真导致的音频信号进行感知增强。通常,音频信号增强的目标是通过应用实时数字信号处理算法来改善失真音频信号的感知音频质量,语音清晰度或其他所需的感知属性。在设计的嵌入式优化框架中,考虑中的音频信号增强问题被公式化并解决为每帧数值优化问题,从而可以根据所需的感知属性计算出最佳的增强音频信号帧。嵌入式优化框架的第一阶段在于制定逐帧优化问题,该问题旨在通过明确合并合适的人类声音感知模型来最大程度地增强所需的感知属性。嵌入式优化框架的第二阶段在于通过使用快速而可靠的优化方法来利用公式化优化问题的内在结构,对制定的每帧优化问题进行在线解决。该嵌入式优化框架适用于四个常见且极具挑战性的音频信号增强问题,即硬削波预补偿,扬声器预补偿,去噪和多麦克风去混响。本文的第一部分集中在预补偿算法上,其中在失真过程影响音频信号之前应用音频信号增强操作。更具体地说,在嵌入式优化框架中解决了硬限幅预补偿和扬声器预补偿的问题。在硬削波预补偿的情况下,通过包括基于瞬时掩蔽阈值的频率权重来构建反映可感知的非线性硬削波失真的目标函数,该频率权重是通过应用感知模型逐帧计算的。使用最优投影梯度方法可以有效地解决由此产生的每帧凸二次优化问题,并为此推导出理论上的复杂性范围。而且,这种最佳投影梯度方法在现场可编程门阵列(FPGA)上的定点硬件实现方式表明,该算法能够实时运行,并且在小型便携式音频设备上不会造成明显的音频质量损失。在扬声器预补偿的情况下,以类似于硬削波预补偿的方式构造反映可感知的线性和非线性扬声器失真组合的目标函数。扬声器使用Hammerstein扬声器模型进行建模,即无记忆非线性和线性FIR滤波器的级联。使用梯度优化方法可以有效地解决由此产生的每帧非凸优化问题,该方法利用了有关Hammerstein扬声器模型中无记忆非线性的可逆性和平滑性的知识。从客观和主观评估实验来看,与标准的预补偿算法相比,用于硬削波和扬声器预补偿的嵌入式优化算法可以改善产生的音频质量,具有统计学意义。本论文的第二部分着重于恢复算法,其中在失真过程影响音频信号之后,将应用音频信号增强操作。更具体地说,嵌入式优化框架解决了降幅和多麦克风混响的问题。降幅公式化为稀疏信号恢复问题,其中恢复是通过解决每帧的l1范数最小化问题来执行的,该问题包括基于瞬时屏蔽阈值的频率权重。结果,去胶算法专注于最大化感知到的音频质量,而不是去胶音频信号的物理信号重构质量。对比的客观和主观评估实验具有统计意义,表明与现有的去噪算法相比,所提出的嵌入式优化去噪算法可以改善最终的音频质量。多麦克风去混响被公式化为非凸优化问题,从而可以联合估计干净的音频信号和室内声学模型参数。结果表明,可以通过包括基于统计后期混响模型的正则化项和干净音频信号的稀疏性来平滑非凸优化问题,这被证明可以改善混响效果。

著录项

  • 作者

    Defraene Bruno;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 nl
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号