The corpus generation device according to the embodiment includes a web page acquisition unit, a reference word acquisition unit, a grant unit, and an output unit. A web page acquisition part acquires the web page containing the explanatory text data regarding a presentation object. The reference word acquisition unit acquires a reference word that is an attribute value related to the presentation target from the web page. The assigning unit extracts a broader term that belongs to a higher rank than the reference word acquired by the reference word acquisition unit from the storage unit that stores the hierarchical relationship information representing the vertical relationship between the attribute values, An attribute tag corresponding to the reference word is assigned to the word. The output unit outputs the explanatory text data to which the attribute tag is assigned by the assigning unit as corpus data.
展开▼