A cluster-weight calculating unit (322) calculates weights for a mask calculating NN in which at least one layer is decomposed into a plurality of clusters, wherein the weights individually correspond to the plurality of clusters, on the basis of signals of the voice of a specific speaker by using a cluster-weight calculating NN. A mask calculating unit (302) calculates a mask for extracting feature quantities of the voice of a specific speaker from feature quantities of observation signals of the voices of one or more speakers on the basis of the feature quantities of the observation signals of the voices of the one or more speakers by using the mask calculating NN weighted with the weights calculated by the cluster-weight calculating unit (322).
展开▼