【24h】

A High Performance Prototype System for Chinese Text Categorization

机译:一种高性能的中文文本分类原型系统

获取原文
获取原文并翻译 | 示例

摘要

How to improve the accuracy of categorization is a big challenge in text categorization. This paper proposes a high performance prototype system for Chinese text categorization, which mainly includes feature extraction subsystem, feature selection subsystem, and reliability evaluation subsystem for classification results. The proposed prototype system employs a two-step classifying strategy. First, the features that are effective for all testing texts are used to classify texts. Then, the reliability evaluation subsystem evaluates the classification results directly according to the outputs of the classifier, and divides them into two parts: texts classified reliable or not. Only for the texts classified unreliable at the first step, go to the second step. Second, a classifier uses the features that are more subtle and powerful for those texts classified unreliable to classify the texts. The proposed prototype system is successfully implemented in a case that exploits a Naive Bayesian classifier as the classifier in the first and second steps. Experiments show that the proposed prototype system achieves a high performance.
机译:如何提高分类的准确性是文本分类的一大挑战。本文提出了一种高性能的中文文本分类原型系统,主要包括特征提取子系统,特征选择子系统和分类结果可靠性评估子系统。所提出的原型系统采用了两步分类策略。首先,将对所有测试文本均有效的功能用于对文本进行分类。然后,可靠性评估子系统根据分类器的输出直接评估分类结果,并将其分为两部分:是否分类为可靠文本。仅对于第一步中分类为不可靠的文本,请转到第二步。其次,对于那些分类不可靠的文本,分类器使用的功能更加微妙和强大。在第一步和第二步中采用朴素贝叶斯分类器作为分类器的情况下,成功实现了所提出的原型系统。实验表明,所提出的原型系统具有较高的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号