The performance of model-based noise suppression is significantly affected by variations in speaker characteristics and the modeling accuracy of the noise. As regards this problem, the joint processing of speaker adaptation and accurate noise model estimation are crucial factors for improving model-based noise suppression. However, this joint processing is computationally intractable due to the direct unobservability of clean speech and noise signals in the conventional approach, which incorporates a vector Taylor series-based approach. To overcome this problem, we investigate a way of achieving joint processing by utilizing minimum mean squared error (MMSE) estimates of clean speech and noise. The MMSE estimates allow the flexible estimation of accurate parameters for the joint processing without intractable computation or any approximation. Here, since the MMSE estimates of clean speech and noise include some estimation errors, the estimation errors often degrade the accuracy of parameter estimation. Thus, we also employ a reliable data selection technique based on voice activity detection to estimate the joint processing parameters. The evaluation result reveals that the proposed reliable data selection method successfully improves both parameter estimation and speech recognition accuracy.
展开▼