【24h】

Top-down Hierarchical method for Chinese Document Classification

机译:自上而下的中文文档分类方法

获取原文
获取原文并翻译 | 示例

摘要

Existing statistical document classification systems often ignore the hierarchical structure of the predefined categories. This makes it difficult to identify which category a document belongs to when the possible categories are domewhat similar. In this article, we propose a top-down classification method according to the hierarchical structure of topics. The purpose is to improve precision and reduce computation of classification systems. Through a concept dictionary(thesaurus), we map the synonyms or lower-level concepts in a document to a small set of feature words that are used as terms. This reduces the computational cost from another aspect by reducing the dimension of the feature space.
机译:现有的统计文件分类系统通常会忽略预定义类别的层次结构。当可能的类别在总体上相似时,这使得很难识别文档属于哪个类别。在本文中,我们根据主题的层次结构提出了一种自顶向下的分类方法。目的是提高精度并减少分类系统的计算。通过概念词典(同义词库),我们将文档中的同义词或下层概念映射到一小套用作术语的特征词。通过减小特征空间的尺寸,这从另一方面减少了计算成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号