This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesn't need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.
展开▼