首页> 外文会议>International Symposium on Computer and Information Sciences(ISCIS 2005); 20051026-28; Istanbul(TR) >Text Categorization with Class-Based and Corpus-Based Keyword Selection
【24h】

Text Categorization with Class-Based and Corpus-Based Keyword Selection

机译:使用基于类别和基于语料库的关键字选择进行文本分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this paper, we examine the use of keywords in text categorization with SVM. In contrast to the usual belief, we reveal that using keywords instead of all words yields better performance both in terms of accuracy and time. Unlike the previous studies that focus on keyword selection metrics, we compare the two approaches for keyword selection. In corpus-based approach, a single set of keywords is selected for all classes. In class-based approach, a distinct set of keywords is selected for each class. We perform the experiments with the standard Reuters-21578 dataset, with both boolean and tf-idf weighting. Our results show that although tf-idf weighting performs better, boolean weighting can be used where time and space resources are limited. Corpus-based approach with 2000 keywords performs the best. However, for small number of keywords, class-based approach outperforms the corpus-based approach with the same number of keywords.
机译:在本文中,我们研究了在SVM文本分类中关键字的使用。与通常的看法相反,我们发现使用关键字而不是所有单词在准确性和时间方面均产生更好的性能。与以前的研究关注关键字选择指标不同,我们比较了两种关键字选择方法。在基于语料库的方法中,为所有类别选择单个关键字集。在基于类的方法中,为每个类选择一组不同的关键字。我们使用标准的Reuters-21578数据集(布尔值和tf-idf权重)进行实验。我们的结果表明,尽管tf-​​idf加权执行得更好,但是可以在时间和空间资源有限的情况下使用布尔加权。具有2000个关键字的基于语料库的方法效果最好。但是,对于少量的关键字,基于类别的方法在关键字数量相同的情况下要优于基于语料库的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号