Mixture models and frequent sets: combining global and local methods for 0-1 data

机译：混合模型和频繁设置：组合全局和本地方法0-1数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the interaction between global and local techniques in data mining. Specifically, we study the collections of frequent sets in clusters produced by a probabilistic clustering using mixtures of Bernoulli models. That is, we first analyze 0-1 datasets by a global technique (probabilistic clustering using the EM algorithm) and then do a local analysis (discovery of frequent sets) in each of the clusters. The results indicate that the use of clustering as a preliminary phase in finding frequent sets produces clusters that have significantly different collections of frequent sets. We also test the significance of the differences in the frequent set collections in the different clusters by obtaining estimates of the underlying joint density. To get from the local patterns in each cluster back to distributions, we use the maximum entropy technique [17] to obtain a local model for each cluster, and then combine these local models to get a mixture model. We obtain clear improvements to the approximation quality against the use of either the mixture model or the maximum entropy model.

机译：我们研究了数据挖掘中的全局和本地技术之间的互动。具体而言，我们研究使用Bernoulli模型的混合物来研究由概率聚类产生的群集中的频繁集群。也就是说，我们首先通过全局技术（使用EM算法的概率聚类）分析0-1数据集，然后在每个群集中进行本地分析（发现频繁集的频繁集）。结果表明，在查找频繁组中使用聚类作为初步阶段产生具有显着不同频繁集合的集群。我们还通过获得潜在的关节密度的估计来测试不同簇中频繁设定集合的差异的重要性。要从每个集群中的本地模式返回分发，我们使用最大熵技术[17]为每个群集获取本地模型，然后将这些本地模型组合以获得混合模型。我们通过混合模型或最大熵模型来获得对近似质量的清晰改进。

著录项

来源
《SIAM International Conference on Data Mining》|2003年|xiv 347 p.|共5页
会议地点
作者
Jaakko Hollmen; Jouni K. Seppanen; Heikki Mannila;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. A critical analysis of the combined usage of protein localization prediction methods: Increasing the number of independent data sets can reduce the accuracy of predicted mitochondrial localization [J] . Lythgow Kieren T., Hudson Gavin, Andras Peter, Mitochondrion . 2011,第3期

机译：蛋白质定位预测方法组合使用的关键分析：增加独立数据集的数量可能会降低预测的线粒体定位的准确性
2. Using general regression with local tuning for learning mixture models from incomplete data sets [J] . Ahmed R. Abas Egyptian Informatics Journal . 2010,第2期

机译：使用一般回归和局部调整从不完整的数据集中学习混合模型
3. Local and global receding methods for anonymizing set-valued data [J] . Manolis Terrovitis, Nikos Mamoulis, Panos Kalnis The VLDB journal . 2011,第1期

机译：局部和全局后退方法，用于匿名化集值数据
4. Mixture models and frequent sets: combining global and local methods for 0-1 data [C] . Jaakko Hollmen, Jouni K. Seppanen, Heikki Mannila SIAM International Conference on Data Mining . 2003

机译：混合模型和频繁设置：组合全局和本地方法0-1数据
5. Applying Recurrence Quantification Analysis Methods for the Analysis of Global Reanalysis and Model Data to Reveal the Local Oscillations of Multiple African Easterly Waves During 2006 [D] . Reyes, Tiffany Amber Lynn. 2018

机译：应用复发量化分析方法，以分析全球再分析和模型数据，揭示2006年多年期珊瑚礁的本地振荡
6. A critical analysis of the combined usage of protein localization prediction methods: Increasing the number of independent data sets can reduce the accuracy of predicted mitochondrial localization [O] . Kieren T. Lythgow, Gavin Hudson, Peter Andras, -1

机译：蛋白质定位预测方法结合使用的关键分析：增加独立数据集的数量可能会降低预测的线粒体定位的准确性
7. Mixture Models and Frequent Sets: Combining Global and Local Methods for 0-1 Data [O] . Jaakko Hollmén, Jouni K. Seppänen, Heikki Mannila 2003

机译：混合模型和常用集：结合0-1数据的全局和局部方法
8. An Extended Kalman Filter for frequent local and infrequent global sensor data fusion [R] . Stergios I. Roumeliotis, George A. Bekey 1997

机译：用于频繁局部和不频繁全局传感器数据融合的扩展卡尔曼滤波器

Mixture models and frequent sets: combining global and local methods for 0-1 data

摘要

著录项

相似文献

相关主题

期刊订阅