This paper presents a novel technique of document clustering based on frequent concepts. The proposed FCDC (Frequent Concepts based Document Clustering), a clustering algorithm works with frequent concepts rather than frequent itemsets used in traditional text mining techniques. Many well known clustering algorithms deal with documents as bag of words while they ignore the important relationship between words like synonym relationship. The proposed algorithm utilizes the semantic relationship between words to create concepts. It exploits the WordNet ontology in turn to create low dimensional feature vector which allows developing a more accurate clustering algorithm.
展开▼