Arabic Documents Clustering is an important task for obtaining good resultswith the traditional Information Retrieval (IR) systems especially with therapid growth of the number of online documents present in Arabic language.Documents clustering aim to automatically group similar documents in onecluster using different similarity/distance measures. This task is oftenaffected by the documents length, useful information on the documents is oftenaccompanied by a large amount of noise, and therefore it is necessary toeliminate this noise while keeping useful information to boost the performanceof Documents clustering. In this paper, we propose to evaluate the impact oftext summarization using the Latent Semantic Analysis Model on Arabic DocumentsClustering in order to solve problems cited above, using fivesimilarity/distance measures: Euclidean Distance, Cosine Similarity, JaccardCoefficient, Pearson Correlation Coefficient and Averaged Kullback-LeiblerDivergence, for two times: without and with stemming. Our experimental resultsindicate that our proposed approach effectively solves the problems of noisyinformation and documents length, and thus significantly improve the clusteringperformance.
展开▼