To provide technology for determining whether uttered voice detected from input voice is that suitable for a prescribed target.SOLUTION: A target speech estimation model learning device comprises: a speech detection section for detecting uttered voice corresponding to voice which a speaker utters and extracting an acoustic feature of the uttered voice from input voice including the voice which the speaker utters and noise; a voice recognition section for generating a voice recognition result set with recognition score from the uttered voice; a vector expression generation section for generating a voice recognition result word vector expression set, and a voice recognition result part-of-speech vector expression set from the voice recognition result set with recognition score; and a target speech determination section for outputting the uttered voice and the voice recognition result set with recognition score when the uttered voice is determined to be the speech suitable for a prescribed target from the uttered voice, the acoustic feature, the voice recognition result set with recognition score, the voice recognition result word vector expression set and the voice recognition result part-of-speech vector expression set by using a target speech estimation model for outputting probability that the uttered voice detected from the input voice is the speech suitable for the prescribed target.SELECTED DRAWING: Figure 3
展开▼