Internet has become the world's largest information repository, especially the explosive growth of the text data on the web, the disadvantages that it need much more time to acquire and update web pages, and is not high precision have become more obvious. The text mining algorithm based on focused crawler is proposed in this paper, it classifies and integrates the whole web pages by topic using topic crawler algorithm as much as possible, which greatly improves the retrieval ability of the web pages, naive bayes algorithm is adopted on this basis, which realizes the text mining processing of the web data. The experimental results show that the algorithm has good feasibility and higher recall ratio and precision ratio of the web pages.
展开▼