The objective of the present invention is to carry out speaker diarization accurately even when a plurality of speakers are speaking simultaneously. This speaker diarization device divides each of a plurality of signals obtained respectively from a plurality of audio signal input units into a plurality of segments of a prescribed time width, extracts a feature amount from each of the segments, collectively clusters the feature amounts extracted from each of the segments of the plurality of signals, and carries out speaker diarization on the basis of the clustering result. The speaker diarization device detects a voice section, which is a section containing an audio signal, from each of the plurality of signals, divides the voice sections of each of the plurality of signals into segments, and extracts a feature amount from each of the segments obtained by the division.
展开▼