Time-frequency representations are ubiquitous in speech and audio signal processing, their use being motivated by both auditory physiology and the mathematics of Fourier analysis. Nonpara-metric statistical models (or equivalently transform based signal processing methods) formulated in this space provide a principled way to decompose sounds into their constituent parts, as well as an effective means of exploiting the local correlation present in the time-frequency structure of naturally generated acoustic signals. Here we describe how an appropriate generative statistical model, even under very simple assumptions, provides a means of exploring sparse time-frequency representations in audio. We introduce a symmetrized lognormal model for spectral coefficients, which shows good agreement across a broad range of speech samples taken from the TIMIT database, and demonstrate preliminary speech enhancement results based on a maximum a posteriori shrinkage estimator.
展开▼