首页> 外文会议>18th ACM conference on information and knowledge management 2009 >Interpretable and reconfigurable clustering of document datasets by deriving word-based rules
【24h】

Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

机译:通过导出基于单词的规则,可解释和可重新配置的文档数据集聚类

获取原文

摘要

Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop a clustering algorithm toward the outlined goal of building interpretable and reconfigurable cluster models; it works by generating rules with disjunctions and conditions on the frequencies of words, to decide on the membership of a document to a cluster. Each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. We show that our approach outperforms the unsupervised decision tree approach by huge margins. We show that the purity and f-measure losses to achieve interpretability are as little as 5% and 3% respectively using our approach.
机译:通过聚类算法输出的文本文档的聚类通常难以解释。我们描述了激励现实世界的场景,这些场景需要集群的可重新配置性和高度可解释性,并概述了使用可解释和可重新配置的集群模型生成集群的问题。我们针对建立可解释和可重新配置的集群模型的概述目标开发了一种集群算法;它的工作方式是根据单词的频次生成带有析取和条件的规则,以决定文档在群集中的成员身份。每个群集都精确地由满足相应规则的一组文档组成。我们证明了我们的方法比无人监督的决策树方法有更大的优势。我们证明,使用我们的方法,获得可解释性的纯度和f-measure损失分别低至5%和3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号