首页> 外文会议>Future Networks, 2010. ICFN '10 >Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization
【24h】

Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization

机译:文本分类中基于同义词合并的特征选择和权重研究

获取原文

摘要

Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term's strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems.
机译:特征选择和权重是文本分类中的关键问题之一。特征选择的主要障碍是噪声和稀疏性。本文提出了一种基于语义统计的中文文本特征选择和加权方法。首先,我们使用同义词概念根据词库(名称为TongYiCi CiLin)提取文本中的特征值。然后,我们引入了一个基于词项频率和熵的新权重函数,该函数根据特征词的强度来调整特征词在分类器中的作用。实验表明,该方法比传统的特征选择方法要好得多,并且可以提高文本分类系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号