Summarizing itemset patterns

机译：汇总项目集模式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process.In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement.Empirical studies indicate that we can obtain compact summarization in real datasets.

机译：频繁模式挖掘已在可伸缩方法上进行了广泛研究，可扩展方法用于挖掘各种模式，包括项目集，序列和图形。但是，由于挖掘过程中生成的大量模式，频繁模式挖掘的瓶颈不是效率，而是可解释性。本文研究了如何仅使用K个代表来总结项目集模式的集合，用户可以轻松处理的少量模式。 K代表不仅应涵盖大多数常见模式，而且应大致支持他们的支持。建立了一个生成模型来提取和分析这些代表，在此模式下，无需参考原始数据集就可以轻松地恢复模式的支持。基于恢复误差，我们提出了一种质量度量函数来确定参数K的最佳值。多项式时间算法与几种优化启发式算法一起被开发出来以提高效率。经验研究表明，我们可以在真实数据集中获得紧凑的总结。

著录项

来源
《ACM SIGKDD international conference on Knowledge discovery in data mining》|2005年|P.314-323|共10页
会议地点
作者
Xifeng Yan; Hong Cheng; Jiawei Han; Dong Xin; PHong Cheng; PJiawei Han;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
summarization;

机译：总结;

相似文献

外文文献
中文文献
专利

1. ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis [J] . Epaminondas Kapetanios Computing reviews . 2021,第1期

机译：ELSA：一种基于频繁项目集和潜在语义分析的多语言文献摘要算法
2. ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis [J] . Epaminondas Kapetanios Computing reviews . 2021,第1期

机译：ELSA：一种基于频繁项目集和潜在语义分析的多语言文献摘要算法
3. ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis. [J] . M. Sohel Rahman Computing reviews . 2020,第5期

机译：ELSA：一种基于频繁项目集和潜在语义分析的多语言文档摘要算法。
4. Summarizing Itemset Patterns Using Probabilistic Models [C] . Chao Wang, Srinivasan Parthasarathy ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'06); 20060820-23; Philadelphia,PA(US) . 2006

机译：使用概率模型总结项目集模式
5. New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases. [D] . Peterson, Erich Allen. 2012

机译：在某些不确定数据库中频繁进行顺序模式和项集数据挖掘的新算法。
6. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data [O] . Otis Smart, Lauren Burrell -1

机译：遗传程序设计和频繁项集挖掘以识别iEEG和fMRI癫痫数据的特征选择模式
7. Summarizing Itemset Patterns Using Probabilistic Models [O] . Chao Wang, Srinivasan Parthasarathy 2006

机译：使用概率模型总结项集模式

Summarizing itemset patterns

摘要

著录项

相似文献

相关主题

期刊订阅