This thesis is concerned with noise reduction in single channel input case andspeech source separation in multi-channel input case. The source separation methodis also applied to speech source localization problem.Various methods for multi-channel speech separation have been reported. Somemethods use a technique where the input time domain signal is transformed into afrequency domain signal, and then a binary mask is applied to it. In these methods,both of information about the difference of magnitude and information aboutthe time delay difference are utilized for generating the binary mask. The presentmethod uses only one of the above two kinds of information. It is shown experimentallythat this approach is effective for improving the separation performance, andalso facilitates increasing the number of input channels.The source separation method is applied to the source localization problem, wheremultiple speakers in the three dimensional space are to be localized. Due to complexityof geometrical calculation, localization of multiple speakers is generally difficult.The method presented here uses different frequencies for different speakers, basedon the W-Disjoint Orthogonality assumption, to calculate the correlation function,yielding almost the same correlation function as that calculated for a single speaker.This method decomposes the multi-speaker localization problem into the source separationproblem and the single-speaker localization problem to make the problemeasier to solve.Among many approaches to single-channel noise reduction, there are ones thatutilize a small speech database. Conventionally, the magnitude of speech spectrumis stored in the database, discarding the phase information. In the present method,speech waveforms are stored in the database without transformation. Speech segmentsthat are similar to the noisy input are searched for in the database using thecorrelation as similarity measure, and concatenated to generate the output signal.Experimental results show that this method is effective for reducing a certain kindof noise.To improve the performance of the above noise reduction method, modificationsare made on the similarity measure. In the modified similarity measure, the correlationfunction is calculated using frequencies that carry mostly clean speech informationcontained in the speech database, frequencies occupied by noise being ignoredas much as possible. It is experimentally shown that this method enhances the noisereduction performance for real environmental noises as well as instrumental musicnoises.
展开▼