首页> 外文会议>SIAM International Conference on Data Mining >Mixture models and frequent sets: combining global and local methods for 0-1 data
【24h】

Mixture models and frequent sets: combining global and local methods for 0-1 data

机译:混合模型和频繁设置:组合全局和本地方法0-1数据

获取原文

摘要

We study the interaction between global and local techniques in data mining. Specifically, we study the collections of frequent sets in clusters produced by a probabilistic clustering using mixtures of Bernoulli models. That is, we first analyze 0-1 datasets by a global technique (probabilistic clustering using the EM algorithm) and then do a local analysis (discovery of frequent sets) in each of the clusters. The results indicate that the use of clustering as a preliminary phase in finding frequent sets produces clusters that have significantly different collections of frequent sets. We also test the significance of the differences in the frequent set collections in the different clusters by obtaining estimates of the underlying joint density. To get from the local patterns in each cluster back to distributions, we use the maximum entropy technique [17] to obtain a local model for each cluster, and then combine these local models to get a mixture model. We obtain clear improvements to the approximation quality against the use of either the mixture model or the maximum entropy model.
机译:我们研究了数据挖掘中的全局和本地技术之间的互动。具体而言,我们研究使用Bernoulli模型的混合物来研究由概率聚类产生的群集中的频繁集群。也就是说,我们首先通过全局技术(使用EM算法的概率聚类)分析0-1数据集,然后在每个群集中进行本地分析(发现频繁集的频繁集)。结果表明,在查找频繁组中使用聚类作为初步阶段产生具有显着不同频繁集合的集群。我们还通过获得潜在的关节密度的估计来测试不同簇中频繁设定集合的差异的重要性。要从每个集群中的本地模式返回分发,我们使用最大熵技术[17]为每个群集获取本地模型,然后将这些本地模型组合以获得混合模型。我们通过混合模型或最大熵模型来获得对近似质量的清晰改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号