In this paper a new pre-processor for a free speech transcription system is described. It performs a speech/non-speech partition, a segmentation of the speech parts into speaker turns, and a clustering of the speaker turns. It works in a stream-based mode, and it is aiming for a high accuracy with a low delay and processing time. Experiments on the Hub4 Broadcast News corpus show that the newly proposed pre-processor is competitive with and in some respects better than the best systems published so far. The paper also describes attempts to raise the system performance by supplementing the standard MFCC features with prosodic features such as pitch and voicing evidence.
展开▼