首页> 外文期刊>MATEC Web of Conferences >A Chinese text classification system based on Naive Bayes algorithm
【24h】

A Chinese text classification system based on Naive Bayes algorithm

机译:基于朴素贝叶斯算法的中文文本分类系统

获取原文
           

摘要

In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences) for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.Key words: Chinese word segmentation / Text categorization / Information gain / Naive Bayes algorithm
机译:针对中文文本分类的特点,利用ICTCLAS(中国科学院汉语词汇分析系统)对文档进行分割,并利用信息增益和文档频率特征选择对数据进行净化和过滤停用词。记录特征选择的算法。在此基础上,基于朴素贝叶斯算法实现的文本分类器,并利用复旦大学中文语料库对该系统进行了实验和分析。关键词:中文分词/文本分类/信息增益/朴素贝叶斯算法

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号