Enabling the automatic human-level (or better) detection and classification of audio events and sound environments would be a clear plus for artificial intelligence (AI)-based applications such as robotics and social signal processing. Typical machine learning approaches to such analysis problems rely on the prior extraction of description features from raw data before semantic analysis; audio-specific feature proposals abound, from frame-based mel-frequency cepstral coefficients (MFCCs) to recurrence quantification analysis (RQA) data.
展开▼