首页> 外文会议>International Conference on Document Analysis and Recognition >Automatic Chinese Text Classification Using Character-Based and Word-Based Approach
【24h】

Automatic Chinese Text Classification Using Character-Based and Word-Based Approach

机译:基于字符和词的自动中文文本分类

获取原文
获取外文期刊封面目录资料

摘要

In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.
机译:在本文中,我们使用基于字符的方法(N-gram)和基于单词的方法研究中文文本分类,并提出使用长度大于或等于3的uni-gram,bi-gram和单词特征。还介绍了可用于赋予词特征更高权重的权重系数。我们进一步研究了一种基于特征变换和降维技术的串行方法,以提高性能。实验结果表明,本文提出的方法对于提高中文文本分类的性能是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号