【24h】

Clustering Documents Along Multiple Dimensions

机译:沿多个维度对文档进行聚类

获取原文

摘要

Traditional clustering algorithms are designed to search for a single clustering solution despite the fact that multiple alternative clustering solutions might exist for a particular dataset. For example, a set of news articles might be clustered by topic or by the author's gender or age. Similarly, book reviews might be clustered by sentiment or comprehensiveness. In this paper, we address the problem of identifying alternative clustering solutions by developing a Probabilistic Multi-Clustering (PMC) model that discovers multiple, maximally different clusterings of a data sample. Empirical results on six datasets representative of real-world applications show that our PMC model exhibits superior performance to comparable multi-clustering algorithms.
机译:尽管存在针对特定数据集的多个替代聚类解决方案这一事实,但传统的聚类算法仍被设计为搜索单个聚类解决方案。例如,一组新闻文章可以按主题或作者的性别或年龄来分类。同样,书评可能会根据情感或全面性进行汇总。在本文中,我们通过开发概率多聚类(PMC)模型来解决发现替代聚类解决方案的问题,该模型可以发现数据样本的多个,最大不同的聚类。对代表实际应用的六个数据集的经验结果表明,我们的PMC模型表现出比同类多集群算法更高的性能。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号