A method for categorizing documents is disclosed. The words composing the documents are tagged according to their parts of speech. A first group of features is selected corresponding to one of the parts of speech. The documents are grouped into clusters according to their semantic affinity to the first set of features and to each other. The clusters are refined into a hierarchy of progressively refined clusters, the features of which are selected based on corresponding parts of speech.
展开▼