Efficient analysis and processing of audio signals would lead to a better utilization of computer vision and machine learning technologies in automating audio related applications. Audio and speech are highly non-stationary signals with a time-varying spectrum. It is difficult to analyze them using simple signal processing tools. Most of the existing techniques segment the audio signals and assume the signal to be quasi stationary within the short periods and apply stationary signal processing tools. However these approaches suffer from fixed time-frequency resolution and cannot accurately model the time varying characteristics of the audio signals. An adaptive joint time-frequency (TF) approach would be the best way to analyze audio signals.
展开▼