Conventional methods of text feature extraction are inadequate at distribution quantification, which to a large extent affects the efficiency of classification. Aiming at this problem, a scheme of Least Document Frequency (LDF) is proposed, which can quantify the concentration and dispersion among feature classes through LDF, thus can reflect the characteristics of the distribution more accurately. Through experiments, TF-LDF algorithm can ac-quire a better result.%传统特征提取改进方法在特征分布信息的量化方面存在不足,很大程度上影响了其分类效能.针对这一问题,提出一种基于最少出现文档频的特征提取改进方法,即TF-LDF算法.该算法用最少出现文档频来量化特征类间集中度与类内离散度,能够更加准确地反映特征分布情况.通过实验结果比较,可以证明TF-LDF算法分类效果更佳.
展开▼