PROBLEM TO BE SOLVED: To provide a speech language corpus generation device that generates a speech language corpus for learning an acoustic model used in speech recognition of a specific program.SOLUTION: A speech language corpus generation device 1 includes: mismatch probability calculation means 12 being associated with a corresponding pattern between a subtitle text and the recognition hypotheses from the recognition hypotheses recognizing speech of a specific program, the subtitle text and an opening line, and calculating mismatch probability between the subtitle text and the opening line; and corpus selection means 42 for calculating an error rate of a corpus candidate subtitle text by the mismatch probability associated the corresponding pattern between the corpus candidate recognition hypotheses obtained by speech recognition of the corpus candidate program speech serving as a candidate of a speech language corpus and the corpus candidate subtitle text, and selecting the corpus candidate program speech of an utterance section having the error rate of a threshold or less and the corpus candidate subtitle text as the speech language corpus.SELECTED DRAWING: Figure 1
展开▼