Text classification plays an important role while studying text data mining and information retrieve, and computing and allocating term weight is the key process while classifying text. Therefore, this paper proposed a dynamic self-adaptive term weighting (DATW) for multi-class text classification, which overcame the disadvantages of the traditional term weighting algorithm TF-1DF. DATW not only considered the term frequency within a text and the number of a text corresponding the term within the whole training set, but also took into account the distribution coefficient and the gradient descent of a term to self-adapting dynamic text classification. It is validated that the performance of using DATW is superior to that of using TF-IDF.%文本分类是研究文本数据挖掘、信息检索的重要手段,文本特征项权重值的计算是文本分类算法的关键.针对经典的特征权重计算方法TF-IDF中存在的不足,提出了一种动态自适应特征权重计算方法(DATW).该算法不仅考虑了特征项在文本中出现的频率及该特征项所属文本在训练集中的数量,而且通过考查特征项的分散度和特征向量梯度差以自适应动态文本的分类.实验结果表明,采用DATW方法计算特征权重可以有效提高文本分类的性能.
展开▼