K-means is a widely used cluster algorithm. It is widely used in text categorization as an unsupervised method. However, it could be easily affected by some isolated observations. BP neural network is usually used for text categorization because it's superiority in handling non-linear problem. However, sometimes it could not achieve high performance. Based on the combination of these two algorithms, we propose a new text categorization algorithm. We first improve k-means clustering algorithm. After that, we use it to cluster vectors in our vector space model. And then, BP neural network is used to categorize the preprocessed vectors. The experiments show that our algorithm could achieve a high performance than the traditional BP neural network text categorization method.
展开▼