首页> 中文期刊> 《计算机与数字工程》 >基于类间分散度和类内集中度的文本特征选择

基于类间分散度和类内集中度的文本特征选择

     

摘要

Feature selection is one of the key steps in text categorization, selected feature subset directly influences results of text categorization. Firstly, two kinds of feature influence degree were defined: one was the dispersion degree amongst categories, the influence degree that was larger was better. Another was the concentration degre in category, the influence degree that was larger was better. And then, the two kinds of influence degree were integrated organically and a new feature selection method was designed. The method can inspect selected feature synthetically so that the feature set that is more representative is obtained. Simulation experiments show that, to a certain extent, the feature selection method is able to improve performance of text categorization.%特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果.论文首先定义了两种特征分类能力:一种是特征对类间文档的分散程度,该分散度越大越好;另一种是特征对类内文档的聚集程度,该集中度越大越好.然后把这两种特征影响度有机地结合起来设计了一个新的特征选择方法,该方法能够对所选特征进行综合考虑,从而使获得的特征集具有较好的代表性.仿真实验表明所提特征选择方法在一定程度上能够提高文本分类性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号