Automatic Document Categorization Interpreting the Perfomance of Clustering Algorithms

机译：自动文档分类解释群集算法的性能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering a document collection is the current approach to automatically derive underlying document categories. The categorization performance of a document clustering algorithm can be captured by the F-Measure, which quantifies how close a human-defined categorization has been resembled. However, a bad F-Measure value tells us nothing about the reason why a clustering algorithm performs poorly. Among several possible explanations the most interesting question is the following: Are the implicit assumptions of the clustering algorithm admissible with respect to a document categorization task? Though the use of clustering algorithms for document categorization is widely accepted, no foundation or rationale has been stated for this admissibility question. The paper in hand is devoted to this gap. It presents considerations and a measure to quantify the sensibility of a clustering process with regard to geometric distortions of the data space. Along with the method of multidimensional scaling, this measure provides an instrument for accessing a clustering algorithm's adequacy.

机译：群集文档集合是当前自动派生底层文档类别的方法。文档聚类算法的分类性能可以通过F-Meader捕获，该方法量化了人类定义分类的封闭式。但是，一个糟糕的f测量值对群集算法表现不佳的原因毫无意义地告诉我们。在几个可能的解释中，最有趣的问题是以下内容：是对文档分类任务的聚类算法的隐含假设？虽然广泛接受了用于文档分类的聚类算法，但没有针对这一可接受性问题规定的基础或理由。手中的纸张致力于这种差距。它呈现了考虑因素和一种措施，以量化集群过程的感性关于数据空间的几何扭曲。随着多维缩放方法，该测量提供了一种用于访问聚类算法的充分性的仪器。

著录项

来源
《German Conference on Artificial Intelligence》|2003年||共13页
会议地点
作者
Benno Stein; Sven Meyer zu Eissen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-532;
关键词
document categorization; clustering; F-Measure; multidimensional scaling; information visualization;

机译：文档分类;聚类;F测量;多维缩放;信息可视化;

相似文献

外文文献
中文文献
专利

1. Semi-supervised fuzzy co-clustering algorithm for document categorization [J] . Yang Yan, Lihui Chen, William-Chandra Tjhi Knowledge and information systems . 2013,第1期

机译：用于文档分类的半监督模糊联合聚类算法
2. A robust automatic clustering algorithm for probability density functions with application to categorizing color images [J] . Chen J. H., Chang Y. C., Hung W. L. Communications in Statistics . 2018,第6a7期

机译：一种鲁棒的概率密度函数自动聚类算法，应用于彩色图像分类
3. A modified fuzzy clustering for documents retrieval: Application to document categorization [J] . S. Nefti, Y. Rezgui, M. Oussalah Operations Research . 2010,第1a2期

机译：一种改进的文档检索模糊聚类：在文档分类中的应用
4. Automatic Document Categorization Interpreting the Perfomance of Clustering Algorithms [C] . Benno Stein, Sven Meyer zu Eissen German Conference on Artificial Intelligence . 2003

机译：自动文档分类解释群集算法的性能
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Comparing a Rule Based vs. Statistical System for Automatic Categorization of MEDLINE® Documents According to Biomedical Specialty [O] . Susanne M. Humphrey, Aurélie Névéol, Julien Gobeil, -1

机译：基于规则与统计系统自动分类mEDLINE®文献根据生物医学专业比较
7. Automatic thematic categorization of documents using a fuzzy taxonomy and fuzzy hierarchical clustering [O] . Wallace, M, Akrivas, G, Stamou, G 2003

机译：使用模糊分类和模糊层次聚类自动对文档进行主题分类
8. Automatic Word Categorization with Genetic Algorithms [R] . Lankhorst, M. M. 1994

机译：基于遗传算法的自动词分类

Automatic Document Categorization Interpreting the Perfomance of Clustering Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅