This paper investigates techniques for determining the repetition structure of musical audio. In particular, we consider the problem of determining segment similarity from the perspective of time series prediction, where we seek to quantify similarity in terms of pairwise predictability between segments. To this end, we propose a novel approach based on multivariate time series modelling of audio features. Using chroma and MFCC features and based on the assumption that correct segment boundaries have been previously obtained, we evaluate the proposed approach against the Beatles dataset. We consider both Queen Mary and Tampere University versions of dataset annotations. We obtain a maximum pairwise F-score of 84%. Compared to a randomised baseline approach, this result corresponds to a performance improvement of 26 percentage points.
展开▼