首页> 外文会议>7th Pacific Rim International Conference on Artificial Intelligence, Aug 18-22, 2002, Tokyo, Japan >A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization
【24h】

A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization

机译:统计机器学习算法与自动文本分类阈值策略的比较研究

获取原文
获取原文并翻译 | 示例

摘要

Two main research areas in statistical text categorization are similarity-based learning algorithms and associated thresholding strategies. The combination of these techniques significantly influences the overall performance of text categorization. After investigating two similarity-based classifiers (k-NN and Rocchio) and three common thresholding techniques (RCut, PCut, and SCut), we describe a new learning algorithm known as the keyword association network (KAN) and a new thresholding strategy (RinSCut) to improve performance over existing techniques. Extensive experiments have been conducted on the Reuters-21578 and 20-Newsgroups data sets. The experimental results show that our new approaches give better results for both micro-averaged F_1 and macro-averaged F_1 scores.
机译:统计文本分类的两个主要研究领域是基于相似度的学习算法和相关的阈值化策略。这些技术的组合极大地影响了文本分类的整体性能。在研究了两个基于相似度的分类器(k-NN和Rocchio)和三种常见的阈值化技术(RCut,PCut和SCut)之后,我们描述了一种称为关键字关联网络(KAN)的新学习算法和一种新的阈值化策略(RinSCut )以提高现有技术的性能。已经对Reuters-21578和20-Newsgroups数据集进行了广泛的实验。实验结果表明,我们的新方法对于微观平均的F_1和宏观平均的F_1分数都给出了更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号