Convolutional Neural Networks have established a new standard in many machine learning applications not only in image but also in audio processing. In this contribution we investigate the interplay between the primary representation mapping a raw audio signal to some kind of image (feature) and the convolutional layers of an ensuing neural network. We introduce a new notion of equivalence of feature-network pairs and show the relation of feature and networks for the example of mel-spectrogram input on the one hand and varying analysis windows on the other hand.
展开▼