The cluster weight calculation unit (322) calculates a weight corresponding to each of the plurality of clusters in the mask calculation NN in which at least one of the layers is decomposed into a plurality of clusters, based on a voice signal of a specific speaker. Calculation is performed using the cluster weight calculation NN. The mask calculation unit (302) extracts a mask that extracts a voice feature amount of a specific speaker from a feature amount of the voice observation signal of one or more speakers. Based on the feature amount, calculation is performed using the mask calculation NN weighted by the weight calculated by the cluster weight calculation unit (322).
展开▼