The common techniques in text mining are based on the statistical analysis of a term, either word or phrase.Text is represented by the words it mentions, and thematic similarity is based on the proportion of words that texts have in common. The complex is constructed using groups of cooccurring words (term associations) identified using traditional data mining methods. Disjoint subsections of the complex (connect components) represent general concepts within the documents’ concept space. A new conceptbased mining model composed of four components, is proposed to improve the text clustering quality. By exploiting the semantic structure of the sentences in documents, a better text clustering result is achieved.
展开▼