Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

机译：通过导出基于单词的规则，可解释和可重新配置的文档数据集聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop a clustering algorithm toward the outlined goal of building interpretable and reconfigurable cluster models; it works by generating rules with disjunctions and conditions on the frequencies of words, to decide on the membership of a document to a cluster. Each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. We show that our approach outperforms the unsupervised decision tree approach by huge margins. We show that the purity and f-measure losses to achieve interpretability are as little as 5% and 3% respectively using our approach.

机译：通过聚类算法输出的文本文档的聚类通常难以解释。我们描述了激励现实世界的场景，这些场景需要集群的可重新配置性和高度可解释性，并概述了使用可解释和可重新配置的集群模型生成集群的问题。我们针对建立可解释和可重新配置的集群模型的概述目标开发了一种集群算法;它的工作方式是根据单词的频次生成带有析取和条件的规则，以决定文档在群集中的成员身份。每个群集都精确地由满足相应规则的一组文档组成。我们证明了我们的方法比无人监督的决策树方法有更大的优势。我们证明，使用我们的方法，获得可解释性的纯度和f-measure损失分别低至5％和3％。

著录项

来源
《18th ACM conference on information and knowledge management 2009》|2009年|P.1773 - 1776|共4页
会议地点
作者
Vipin Balachandran; Deepak P; Deepak Khemani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
interpretable clustering;

机译：可解释的聚类;

相似文献

外文文献
中文文献
专利

1. Interpretable and reconfigurable clustering of document datasets by deriving word-based rules [J] . Vipin Balachandran, Deepak P., Deepak Khemani Knowledge and information systems . 2012,第3期

机译：通过导出基于单词的规则，可解释和可重新配置的文档数据集聚类
2. Interpretable and reconfigurable clustering of document datasets by deriving word-based rules [J] . Vipin Balachandran, Deepak P, Deepak Khemani Knowledge and Information Systems . 2012,第3期

机译：通过导出基于单词的规则，可解释和可重新配置的文档数据集聚类
3. A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability [J] . Anaraki Seyed Alireza Mousavian, Haeri Abdorrahman, Moslehi Fateme Pattern Analysis and Applications . 2021,第3期

机译：具有创新方法的PCA和K-in的混合互惠模型，其考虑子数据集改进K-Means初始化和逐步标记，以创建具有高可解释性的群集
4. Interpretable and Reconfigurable Clustering of Document Datasets by Deriving Word-based Rules [C] . Vipin Balachandran, Deepak P, Deepak Khemani 18th ACM conference on information and knowledge management 2009 . 2009

机译：通过导出基于单词的规则可解释和可重新配置的文档数据集聚类
5. Supervised precision ordinal clustering – A human-machine learning algorithm to create accurate clusters in big datasets: Application to indiana water quality data with novel visualization techniques [D] . Singh, Sarabjit 2014

机译：有监督的有序序数聚类–一种人机学习算法，可在大型数据集中创建准确的聚类：采用新颖的可视化技术应用于印第安纳州水质数据
6. An interpretable framework for clustering single-cell RNA-Seq datasets [O] . Jesse M. Zhang, Jue Fan, H. Christina Fan, 2018

机译：用于解释单细胞RNA-Seq数据集的可解释框架
7. Interpretable and Reconfigurable Clustering of Document Datasets by Deriving Word-based Rules [O] . Balachandran, Vipin, Padmanabhan, Deepak, Khemani, Deepak 2012

机译：通过导出基于单词的规则可解释和可重新配置的文档数据集聚类

Interpretable and reconfigurable clustering of document datasets by deriving word-based rules

摘要

著录项

相似文献

相关主题

期刊订阅