This thesis investigates the design and evaluation of an embedded optimization framework for the perceptual enhancement of audio signals which are degraded by linear and/or nonlinear distortion. In general, audio signal enhancement has the goal to improve the perceived audio quality, speech intelligibility, or another desired perceptual attribute of the distorted audio signal by applying a real-time digital signal processing algorithm. In the designed embedded optimization framework, the audio signal enhancement problem under consideration is formulated and solved as a per-frame numerical optimization problem, allowing to compute the enhanced audio signal frame that is optimal according to a desired perceptual attribute. The first stage of the embedded optimization framework consists in the formulation of the per-frame optimization problem aimed at maximally enhancing the desired perceptual attribute, by explicitly incorporating a suitable model of human sound perception. The second stage of the embedded optimization framework consists in the on-line solution of the formulated per-frame optimization problem, by using a fast and reliable optimization method that exploits the inherent structure of the optimization problem. This embedded optimization framework is applied to four commonly encountered and challenging audio signal enhancement problems, namely hard clipping precompensation, loudspeaker precompensation, declipping and multi-microphone dereverberation. The first part of this thesis focuses on precompensation algorithms, in which the audio signal enhancement operation is applied before the distortion process affects the audio signal. More specifically, the problems of hard clipping precompensation and loudspeaker precompensation are tackled in the embedded optimization framework. In the context of hard clipping precompensation, an objective function reflecting the perceptible nonlinear hard clipping distortion is constructed by including frequency weights based on the instantaneous masking threshold, which is computed on a frame-by frame basis by applying a perceptual model. The resulting per-frame convex quadratic optimization problems are solved efficiently using an optimal projected gradient method, for which theoretical complexity bounds are derived. Moreover, a fixed-point hardware implementation of this optimal projected gradient method on a field programmable gate array (FPGA) shows the algorithm to be capable to run in real time and without perceptible audio quality loss on a small and portable audio device. In the context of loudspeaker precompensation, an objective function reflecting the perceptible combined linear and nonlinear loudspeaker distortion is constructed in a similar fashion as for hard clipping precompensation. The loudspeaker is modeled using a Hammerstein loudspeaker model, i.e. a cascade of a memoryless nonlinearity and a linear FIR filter. The resulting per-frame nonconvex optimization problems are solved efficiently using gradient optimization methods which exploit knowledge on the invertibility and the smoothness of the memoryless nonlinearity in the Hammerstein loudspeaker model. From objective and subjective evaluation experiments, it is concluded with statistical significance that the embedded optimization algorithms for hard clipping and loudspeaker precompensation improve the resulting audio quality when compared to standard precompensation algorithms.The second part of this thesis focuses on recovery algorithms, in which the audio signal enhancement operation is applied after the distortion process affects the audio signal. More specifically, the problems of declipping and multi-microphone dereverberation are tackled in the embedded optimization framework. Declipping is formulated as a sparse signal recovery problem where the recovery is performed by solving a per-frame l1-norm minimization problem, which includes frequency weights based on the instantaneous masking threshold. As a result, the declipping algorithm is focused on maximizing the perceived audio quality instead of the physical signal reconstruction quality of the declipped audio signal. Comparative objective and subjective evaluation experiments reveal with statistical significance that the proposed embedded optimization declipping algorithm improves the resulting audio quality compared to existing declipping algorithms. Multi-microphone dereverberation is formulated as a nonconvex optimization problem, allowing for the joint estimation of the clean audio signal and the room acoustics model parameters. It is shown that the nonconvex optimization problem can be smoothed by including regularization terms based on a statistical late reverberation model and a sparsity prior for the clean audio signal, which is demonstrated to improve the dereverberation performance.
展开▼