首页> 外文会议>ACM SIGKDD international conference on Knowledge discovery in data mining >Combining partitions by probabilistic label aggregation
【24h】

Combining partitions by probabilistic label aggregation

机译:通过概率标签聚合来组合分区

获取原文

摘要

Data clustering represents an important tool in exploratory data analysis. The lack of objective criteria render model selection as well as the identification of robust solutions particularly difficult. The use of a stability assessment and the combination of multiple clustering solutions represents an important ingredient to achieve the goal of finding useful partitions. In this work, we propose a novel way of combining multiple clustering solutions for both, hard and soft partitions: the approach is based on modeling the probability that two objects are grouped together. An efficient EM optimization strategy is employed in order to estimate the model parameters. Our proposal can also be extended in order to emphasize the signal more strongly by weighting individual base clustering solutions according to their consistency with the prediction for previously unseen objects. In addition to that, the probabilistic model supports an out-of-sample extension that (i) makes it possible to assign previously unseen objects to classes of the combined solution and (ii) renders the efficient aggregation of solutions possible. In this work, we also shed some light on the usefulness of such combination approaches. In the experimental result section, we demonstrate the competitive performance of our proposal in comparison with other recently proposed methods for combining multiple classifications of a finite data set.
机译:数据聚类是探索性数据分析中的重要工具。缺乏客观标准使得模型选择以及可靠解决方案的识别变得尤为困难。稳定性评估的使用以及多个聚类解决方案的组合是实现找到有用分区的目标的重要组成部分。在这项工作中,我们提出了一种针对硬分区和软分区组合多种聚类解决方案的新颖方法:该方法基于对两个对象分组在一起的概率进行建模的基础。为了评估模型参数,采用了有效的EM优化策略。我们的建议也可以扩展,以通过根据单个基础聚类解决方案与先前未见过的物体的预测的一致性来加权各个基础聚类解决方案,从而更强地强调信号。除此之外,概率模型还支持样本外扩展,该扩展使(i)可以将先前未见过的对象分配给组合解决方案的类,并且(ii)使解决方案的有效聚合成为可能。在这项工作中,我们还阐明了这种组合方法的有用性。在实验结果部分,我们证明了我们的建议与其他最近提出的将有限数据集的多个分类相结合的方法相比具有竞争优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号