首页> 外文会议>International Conference on String Processing and Information Retrieval >Automatic Document Categorization Based on k-NN and Object-Based Thesauri
【24h】

Automatic Document Categorization Based on k-NN and Object-Based Thesauri

机译:基于K-Nn和基于对象的叙述的自动文档分类

获取原文

摘要

The k-NN classifier(k-NN) is one of the most popular document categorization methods because of its simplicity and relatively good performance. However, it significantly degrades precision when ambiguity arises - there exist more than one candidate category for a document to be assigned. To remedy the drawback, we propose a new method, which incorporates the relationships of object-based thesauri into the document categorization using k-NN. Employing the thesaurus entails structuring categories into taxonomies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between themselves. By referencing relationships in the thesaurus which correspond to the structured categories, k-NN can be drastically improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relationships to reduce the ambiguity. Experimental results show that the proposed approach improves the precision of k-NN up to 13.86% without compromising its recall.
机译:K-NN分类器(K-NN)是最流行的文档分类方法之一,因为其简单性和性能相对较好。但是,当含糊不相识时,它会显着降低精度 - 有多个候选类别用于分配文档。为了解决缺点,我们提出了一种新方法,该方法将基于对象的词库的关系融入了使用K-Nn的文档分类。雇用词库需要在分类学中构建类别,因为它们的结构需要符合同义词库来捕捉自己之间的关系。通过参考与结构类别对应的词库中的关系,K-Nn可以大大提高,从而消除歧义。在本文中,我们首先使用K-Nn执行文档分类,然后使用关系来减少歧义。实验结果表明,该方法可提高K-NN的精度高达13.86%,而不会损失其召回。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号