首页> 外文会议>Computational Linguistics and Intelligent Text Processing >A Study on Feature Weighting in Chinese Text Categorization
【24h】

A Study on Feature Weighting in Chinese Text Categorization

机译:中文文本分类中的特征权重研究

获取原文

摘要

In Text Categorization (TC) based on Vector Space Model, feature weighting and feature selection are major problems and difficulties. This paper proposes two methods of weighting features by combining the relevant influential factors together. A TC system for Chinese texts is designed in terms of character bigrams as features. Experiments on a document collection of 71,674 texts show that the F1 metric of categorization performance of the system is 85.9%, which is about 5% higher than that of the well-known TF*IDF weighting scheme. Moreover, a multi-step feature selection process is exploited to reduce the dimension of the feature space effectively in the system.
机译:在基于向量空间模型的文本分类(TC)中,特征加权和特征选择是主要问题和困难。通过结合相关影响因素,提出了两种加权特征的方法。针对中文文本的TC系统是根据字符双字母组作为特征而设计的。对71,674条文本的文档收集进行的实验表明,该系统的分类性能的F1度量标准是85.9%,比众所周知的TF * IDF加权方案高出5%。此外,利用多步特征选择过程来有效地减小系统中特征空间的尺寸。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号