In this work, we present an ensemble for automated audio classification that fuses different types of featuresextracted from audio files. These features are evaluated, compared, and fused with the goal of producing betterclassification accuracy than other state-of-the-art approaches without ad hoc parameter optimization. We present anensemble of classifiers that performs competitively on different types of animal audio datasets using the same set ofclassifiers and parameter settings. To produce this general-purpose ensemble, we ran a large number of experimentsthat fine-tuned pretrained convolutional neural networks (CNNs) for different audio classification tasks (bird, bat, andwhale audio datasets). Six different CNNs were tested, compared, and combined. Moreover, a further CNN, trainedfrom scratch, was tested and combined with the fine-tuned CNNs. To the best of our knowledge, this is the largeststudy on CNNs in animal audio classification. Our results show that several CNNs can be fine-tuned and fused forrobust and generalizable audio classification. Finally, the ensemble of CNNs is combined with handcrafted texturedescriptors obtained from spectrograms for further improvement of performance.
展开▼