High dimensional test data need clustering. So clustering is an important and difficult task to perform when automation is required. Many scholars are working in this field to reduce manual operation or background information passing. This paper has proposed a model for documents clustering without having background information. Document term features were extracted and collect in a matrix as per term frequency value. A genetic algorithm was applied to cluster each term in a cluster as per the similarity of content. Term frequency distance was a measuring evaluation parameter for finding the fitness of the chromosome. Cluster centers representing document terms were obtained from genetic algorithms. The output of the genetic algorithm was used as a training vector for the document cluster class identification. The experiment was done on a real dataset of research articles from various fields of engineering. The result shows that the proposed model has increased the precision, recall, and accuracy parameter of document clustering.
展开▼