Separation of singing voice from music accompaniment is a topic of great utility in many application of Music Information Retrieval. In the context of stereophonic music mixtures, many algorithms face this problem making use of the spatial diversity of the sound sources to localize and isolate the singing voice. Although these spatial approaches can obtain acceptable results, the separated signal usually is affected by a high level of distortions and artifacts. In this paper, we propose a method for improving the isolation of the singing voice in stereo recordings based on incorporating the fundamental frequency (FO) information to the separation process. First, the singing voice is pre-separated from the input mixture using a state-of-the-art stereo source separation method, the MuLeTs algorithm. Then, the FO of this pre-separated signal is obtained using a robust pitch estimator based on the computation of the difference function and Hidden Markov Models, obtaining a smooth pitch contour with voiced/unvoiced decisions. A binary mask is finally constructed from FO to isolate the singing voice from the original mix. The method has been tested on studio music recordings, obtaining good separation results.
展开▼