首页> 外文会议>Computational Linguistics and Intelligent Text Processing >Chinese Documents Classification Based on N-Grams
【24h】

Chinese Documents Classification Based on N-Grams

机译:基于N-gram的中文文献分类

获取原文

摘要

Traditional Chinese documents classifiers are based on keywords in the documents, which need dictionaries support and efficient segmentation procedures. This paper explores the techniques of utilizing N-gram information to categorize Chinese documents so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A Chinese documents classification system following above described techniques is implemented with Naive Bayes, kNN and hierarchical classification methods. Experimental results show that our system can achieve satisfactory performance, which is comparable with other traditional classifiers.
机译:繁体中文文档分类器基于文档中的关键字,这需要字典支持和有效的切分程序。本文探讨了利用N-gram信息对中文文档进行分类的技术,从而使分类器摆脱了大词典和复杂的分割处理的负担,并因此具有时域独立性。利用朴素贝叶斯,kNN和分层分类方法实现遵循上述技术的中文文档分类系统。实验结果表明,我们的系统可以实现令人满意的性能,这可以与其他传统分类器相媲美。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号