In this paper we present our system for speaker diarization of broadcast news based on recent advances in the speaker recognition field. In the system, speaker segments determined by the speaker change-point detector are represented by i-vectors and similarity of segments' speakers evaluated using cosine distance scoring. Linear discriminant analysis is employed to cope with intra-speaker variability. The experiments were carried out using the COST278 multilingual broadcast news database. We demonstrate improvement of the performance over the baseline system based on the Bayesian Information Criterion (BIC) and highlight significant impact of cepstral mean normalization. Finally, two-stage clustering employing BIC-based clustering to pre-cluster segments in the first stage is examined and showed to yield further performance improvement. The best performing configuration of our system achieved 52.4% relative improvement of the speaker error rate over the baseline.
展开▼