We present some experiments in the semi-automatic extraction of singing voice structure. The main characteristic of the proposed approach is that the segmentation is performed specifically for each individual song using a process we call bootstrapping. In bootstrapping, a small random sampling of the song is annotated by the user. This annotation is used to learn the song-specific voice characteristics and the trained classifier is subsequently used to classify and segment the whole song. We present experimental results on a collection of pieces with jazz singers that show the potential of this approach and compare it with the traditional approach of using multiple songs for training. It is our belief that the idea of song-specific bootstrapping is applicable to other types of music and audio computer-supported annotation.
展开▼