Comparing document contents is provided. An ontological concept is extracted from a text snippet of a corpus document. One or more feature vectors are constructed that include associative information that describes an ontology that includes the focused concept. A topic model is trained using the one or more feature vectors. First and second topic sets are respectively extracted from first and second documents using the topic model. One or more topics from the first topic set are matched, using the topic model, with one or more topics from the second topic set to construct a matched topic set. Semantic analyses are respectively performed on first and second text snippet sets, wherein the first and second text snippet sets are chosen based, at least in part, on the matched topic set. Text snippets are matched based, at least in part, on the first and second semantic analyses.
展开▼