Evaluation of Statistical Approaches to Text Categorization

机译：文本分类统计方法的评价

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper is a comparative study of text categorization methods. Fourteenmethods are investigated, based on previously published results and newly obtained results from additional experiments. Corps biases in commonly used document collection are examined using the performance of three classifiers. Problems in previously published experiments are analyzed, and the results of flawed experiments are excluded from the cross-method evaluation. As a result, eleven out of the fourteen methods are remained. A k-nearest neighbor (kNN) classifier was chosen for the performance baseline on several collections; on each collection, the performance scores of other methods were normalized using the score of kNN. This provides a common basis for a global observation on methods whose results are only available on individual collections. Widrow-Hoff, k-nearest neighbor, neural networks and the Linear Least Squares Fit mapping are the top-performing classifiers, while the Rocchio approaches had relatively poor results compared to the other learning methods. KNN is the only learning method that has scaled to the full domain of MEDLINE categories, showing a graceful behavior when the target space grows from the level of one hundred categories to a level of tens of thousands.

著录项

作者
Yang, Y.;
展开▼
作者单位

展开▼
年度 1997
页码
总页数 12
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Expert systems; Text processing; Data bases; Algorithms; Neural nets; Learningmachines; Regression analysis; Least squares method;

机译：专家系统;文本处理;数据库;算法;神经网络;学习机器;回归分析;最小二乘法;

相似文献

外文文献
中文文献
专利

1. Evaluation of Feature Selection Approaches for Urdu Text Categorization [J] . Tehseen Zia, Qaiser Abbas, Muhammad Pervez Akhtar International Journal of Intelligent Systems and Applications . 2015,第6期

机译：乌尔都语文本分类特征选择方法的评估
2. Comparative Study of Feature Selection Approaches for Urdu Text Categorization [J] . Tehseen Zia, Muhammad Pervez Akhter, Qaiser Abbas Malaysian Journal of Computer Science . 2015,第2期

机译：乌尔都语文本分类特征选择方法的比较研究
3. Comparative Study of Feature Selection Approaches for Urdu Text Categorization [J] . Muhammad Pervez Akhter, Qaiser Abbas, Tehseen Zia Malaysian Journal of Computer Science . 2015,第2期

机译：乌尔都语文本分类特征选择方法的比较研究
4. A Review on Supervised Machine Learning Text Categorization Approaches [C] . Aayushi A. Shah, Keyur Rana International Conference on Circuits and Systems in Digital Enterprise Technology . 2018

机译：监督机器学习文本分类方法综述
5. Robust statistical techniques for the categorization of images using associated text. [D] . Sable, Carl Lewis. 2003

机译：使用关联文本对图像进行分类的可靠统计技术。
6. Scaling up the evaluation of psychotherapy: evaluating motivational interviewing fidelity via statistical text classification [O] . David C Atkins, Mark Steyvers, Zac E Imel, 1985

机译：扩大心理治疗的评估：通过统计文本分类评估动机性面试的忠诚度
7. Implementation and Evaluation of Scalable Approaches for Automatic Chinese Text Categorization [O] . Tsay Jyh-Jong, Wang Jing-Doo, Pai Chun-Fu, 1999

机译：中文文本自动分类可扩展方法的实现与评价

Evaluation of Statistical Approaches to Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅