This paper apply KD-Tree to KNN text classification algorithm,firstly put a training text set into a KD-Tree,then search KD-Tree for the all parents nodes of the tested text node,the set including these parents text nodes is the most nearest text set,the type of the tested text is the same as the type of the most nearest text which has the most similarity with the test text,this algorithm decreases the number of the compared texts,and the time complexity is o(log2N).Experiments show that the improved KNN text classification algorithm is better than the traditional KNN text classification in classification efficiency.%本文将KD-Tree应用到KNN文本分类算法中,先对训练文本集建立一个KD-Tree,然后在KD-Tree中搜索测试文本的所有祖先节点文本,这些祖先节点文本集合就是待测文本的最邻近文本集合,与测试文本有最大相似度的祖先的文本类型就是待测试文本的类型,这种算法大大减少了参与比较的向量文本数目,时间复杂度仅为O(log2N)。实验表明,改进后的KNN文本分类算法具有比传统KNN文本分类法更高的分类效率。
展开▼