A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization

机译：统计机器学习算法与自动文本分类阈值策略的比较研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Two main research areas in statistical text categorization are similarity-based learning algorithms and associated thresholding strategies. The combination of these techniques significantly influences the overall performance of text categorization. After investigating two similarity-based classifiers (k-NN and Rocchio) and three common thresholding techniques (RCut, PCut, and SCut), we describe a new learning algorithm known as the keyword association network (KAN) and a new thresholding strategy (RinSCut) to improve performance over existing techniques. Extensive experiments have been conducted on the Reuters-21578 and 20-Newsgroups data sets. The experimental results show that our new approaches give better results for both micro-averaged F_1 and macro-averaged F_1 scores.

机译：统计文本分类的两个主要研究领域是基于相似度的学习算法和相关的阈值化策略。这些技术的组合极大地影响了文本分类的整体性能。在研究了两个基于相似度的分类器（k-NN和Rocchio）和三种常见的阈值化技术（RCut，PCut和SCut）之后，我们描述了一种称为关键字关联网络（KAN）的新学习算法和一种新的阈值化策略（RinSCut ）以提高现有技术的性能。已经对Reuters-21578和20-Newsgroups数据集进行了广泛的实验。实验结果表明，我们的新方法对于微观平均的F_1和宏观平均的F_1分数都给出了更好的结果。

著录项

来源
《7th Pacific Rim International Conference on Artificial Intelligence, Aug 18-22, 2002, Tokyo, Japan》|2002年|p.444-453|共10页
会议地点 Tokyo(JP);Tokyo(JP)
作者
Kang Hyuk Lee; Judy Kay; Byeong Ho Kang; Uwe Rosebrock;
展开▼
作者单位

School of Information Technologies, University of Sydney, NSW 2006, Australia;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization [J] . Sharef Nurfadhlina Mohd, Martin Trevor, Kasmiran Khairul Azhar, Soft computing: A fusion of foundations, methodologies and applications . 2015,第6期

机译：进化模糊语法与机器学习技术在文本分类中的比较研究
2. Statistical Bayesian Learning for Automatic Arabic Text Categorization [J] . Bassam Al-Salemi, Mohd. Juzaiddin Ab Aziz Journal of computer sciences . 2011,第1期

机译：用于自动阿拉伯文本分类的统计贝叶斯学习
3. Statistical Bayesian Learning for Automatic Arabic Text Categorization | Science Publications [J] . Bassam Al-Salemi, Mohd. J. Ab Aziz Journal of computer sciences . 2010,第1期

机译：自动阿拉伯文本分类的统计贝叶斯学习科学出版物
4. A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization [C] . Kang Hyuk Lee, Judy Kay, Byeong Ho Kang, Pacific Rim International Conference on Artificial Intelligence . 2002

机译：统计机器学习算法和自动文本分类的阈值策略的比较研究
5. Machine Learning and Text Analysis Using Clustering, Classification, Categorization for Applied Industry Research and Its Effect on Trends and Prediction Analysis of a Doctor of Professionals Studies in Computing Dissertation Categories [D] . Haigler, Ashley. 2021

机译：采用集群，分类，分类，应用行业研究的机器学习和文本分析及其对计算论文中专业人士研究博士趋势和预测分析的影响
6. Automatic Semantic Categorization of News Headlines using Ensemble Machine Learning: A Comparative Study [O] . Raghad Bogery, Nora Al, Nida Aslam, 2019

机译：使用集合机学习的新闻标题自动分类：比较研究

A Comparative Study on Statistical Machine Learning Algorithms and Thresholding Strategies for Automatic Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅