Chinese Documents Classification Based on N-Grams

机译：基于N-gram的中文文献分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditional Chinese documents classifiers are based on keywords in the documents, which need dictionaries support and efficient segmentation procedures. This paper explores the techniques of utilizing N-gram information to categorize Chinese documents so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A Chinese documents classification system following above described techniques is implemented with Naive Bayes, kNN and hierarchical classification methods. Experimental results show that our system can achieve satisfactory performance, which is comparable with other traditional classifiers.

机译：繁体中文文档分类器基于文档中的关键字，这需要字典支持和有效的切分程序。本文探讨了利用N-gram信息对中文文档进行分类的技术，从而使分类器摆脱了大词典和复杂的分割处理的负担，并因此具有时域独立性。利用朴素贝叶斯，kNN和分层分类方法实现遵循上述技术的中文文档分类系统。实验结果表明，我们的系统可以实现令人满意的性能，这可以与其他传统分类器相媲美。

著录项

来源
《Computational Linguistics and Intelligent Text Processing》|2002年|p.405-414|共10页
会议地点
作者
Shuigeng Zhou; Jihong Guan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
chinese documents classification; N-grams; feature selection; bayesian classification; kNN method; hierarchical classification;

机译：中文文件分类; N-克特征选择;贝叶斯分类kNN方法;等级分类;

相似文献

外文文献
中文文献
专利

1. Hierarchical classification of Chinese documents based on N-grams [J] . Zhou Shui-geng, Guan Ji-hong, He Yan-xiang Wuhan University Journal of Natural Sciences . 2001,第1a2期

机译：基于N-gram的中文文档的分层分类
2. Classification of documents based on contents using the n-gram method of MNB model [J] . Junaina Jamil Najim Aldin AL-Bayati International journal of computer science and network security . 2015,第10期

机译：使用MNB模型的n-gram方法基于内容分类文档
3. Sentiment Classification Using N-Gram Inverse Document Frequency and Automated Machine Learning [J] . Rungroj Maipradit, Hideaki Hata, Kenichi Matsumoto IEEE Software . 2019,第5期

机译：使用N-Gram逆文档频率和自动机器学习进行情感分类
4. Hierarchical Classification of Chinese Documents Based on N-grams [C] . Jihong Guan, Shuigeng Zhou, Lecture Notes in Computer Science 2911 International Conference on Asian Digital Libraries . 2003

机译：基于N-GRAM的中国文档的分层分类
5. Automatic biological term annotation using n-gram and classification models [D] . Jiampojamarn, Sittichai 2005

机译：使用n-gram和分类模型的自动生物术语注释
6. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents [O] . Deepak Agnihotri, Kesari Verma, Priyanka Tripathi -1

机译：计算N-gram的对称强度：文本文档自动分类中的两遍过滤方法
7. Novel Topic N-gram Count LM Incorporating Document-based Topic Distributions and N-gram Counts [O] . Haidar Md. Akmal, OShaughnessy D. 2014

机译：结合基于文档的主题分布和N-gram计数的新颖主题N-gram计数LM

Chinese Documents Classification Based on N-Grams

摘要

著录项

相似文献

相关主题

期刊订阅