首页> 外文会议>German Conference on Artificial Intelligence >Automatic Document Categorization Interpreting the Perfomance of Clustering Algorithms
【24h】

Automatic Document Categorization Interpreting the Perfomance of Clustering Algorithms

机译:自动文档分类解释群集算法的性能

获取原文

摘要

Clustering a document collection is the current approach to automatically derive underlying document categories. The categorization performance of a document clustering algorithm can be captured by the F-Measure, which quantifies how close a human-defined categorization has been resembled. However, a bad F-Measure value tells us nothing about the reason why a clustering algorithm performs poorly. Among several possible explanations the most interesting question is the following: Are the implicit assumptions of the clustering algorithm admissible with respect to a document categorization task? Though the use of clustering algorithms for document categorization is widely accepted, no foundation or rationale has been stated for this admissibility question. The paper in hand is devoted to this gap. It presents considerations and a measure to quantify the sensibility of a clustering process with regard to geometric distortions of the data space. Along with the method of multidimensional scaling, this measure provides an instrument for accessing a clustering algorithm's adequacy.
机译:群集文档集合是当前自动派生底层文档类别的方法。文档聚类算法的分类性能可以通过F-Meader捕获,该方法量化了人类定义分类的封闭式。但是,一个糟糕的f测量值对群集算法表现不佳的原因毫无意义地告诉我们。在几个可能的解释中,最有趣的问题是以下内容:是对文档分类任务的聚类算法的隐含假设?虽然广泛接受了用于文档分类的聚类算法,但没有针对这一可接受性问题规定的基础或理由。手中的纸张致力于这种差距。它呈现了考虑因素和一种措施,以量化集群过程的感性关于数据空间的几何扭曲。随着多维缩放方法,该测量提供了一种用于访问聚类算法的充分性的仪器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号