首页> 外文会议>European Conference on Principles of Data Mining and Knowledge Discovery >A Scalable Constant-Memory Sampling Algorithm for Pattern Discovery in Large Databases

【24h】

A Scalable Constant-Memory Sampling Algorithm for Pattern Discovery in Large Databases

机译：大型数据库中模式发现的可扩展常数存储器采样算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many data mining tasks can be seen as an instance of the problem of finding the most interesting (according to some utility function) patterns in a large database. In recent years, significant progress has been achieved in scaling algorithms for this task to very large databases through the use of sequential sampling techniques. However, except for sampling-based greedy algorithms which cannot give absolute quality guarantees, the scalability of existing approaches to this problem is only with respect to the data, not with respect to the size of the pattern space: it is universally assumed that the entire hypothesis space fits in main memory. In this paper, we describe how this class of algorithms can be extended to hypothesis spaces that do not fit in memory while maintaining the algorithms' precise epsilon-delta quality guarantees. We present a constant memory algorithm for this task and prove that it possesses the required properties. In an empirical comparison, we compare variable memory and constant memory sampling.

机译：许多数据挖掘任务可以被视为找到大数据库中最有趣的（根据一些实用程序函数）模式的问题的实例。近年来，通过使用顺序采样技术，在这项任务的比例算法中实现了显着进展。但是，除了不能提供绝对质量保证的基于采样的贪婪算法之外，该问题的现有方法的可扩展性仅是关于数据的，而不是关于模式空间的大小：它普遍认为整个假设空间适合主记忆。在本文中，我们将介绍如何类算法可以扩展到假设空间，不适合在内存中，同时保持算法精确的小量-Δ质量保证。我们为此任务呈现了一个恒定的内存算法，并证明它具有所需的属性。在实证比较中，我们比较可变内存和恒定的内存采样。

著录项

来源
《European Conference on Principles of Data Mining and Knowledge Discovery 》|2002年||共13页
会议地点
作者
Tobias Scheffer; Stefan Wrobel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive Sampling Methods for Scaling Up knowledge Discovery Algorithms [J] . Carlos Domingo, Ricard Gavalda, Osamu Watanabe Data mining and knowledge discovery . 2002 ,第2期

机译：扩大知识发现算法的自适应采样方法
2. Adaptive Sampling Methods for Scaling Up knowledge Discovery Algorithms [J] . Carlos Domingo, Ricard Gavalda, Osamu Watanabe Data mining and knowledge discovery . 2002 ,第2期

机译：扩大知识发现算法的自适应采样方法
3. An Algorithm for Finding Frequently Appearing Long String Patterns from Large Scale Databases [J] . Takeaki Uno, Juzoh Umemori, Tsuyoshi Koide 電子情報通信学会技術研究報告. システム数理と応用. Mathematical Systems Science and its Applications . 2013 ,第279期

机译：从大型数据库中查找经常出现的长字符串模式的算法
4. A Scalable Constant-Memory Sampling Algorithm for Pattern Discovery in Large Databases [C] . Tobias Scheffer, Stefan Wrobel 6th European Conference on Principles of Data Mining and Knowledge Discovery PKDD 2002, Aug 19-23, 2002, Helsinki, Finland . 2002

机译：用于大型数据库中模式发现的可扩展常量内存采样算法
5. Relational discovery in sequentially-connected data streams: Efficient algorithms for lossless pattern discovery and change detection. [D] . Coble, Jeffrey Allen. 2005

机译：顺序连接的数据流中的关系发现：用于无损模式发现和更改检测的高效算法。
6. Methods to improve the quality of smoking records in a primary care EMR database: exploring multiple imputation and pattern-matching algorithms [O] . Stephanie Garies, Michael Cummings, Hude Quan, 2020

机译：改善初级保健电子病历数据库中吸烟记录质量的方法：探索多种归因和模式匹配算法
7. A Scalable Constant-Memory Sampling Algorithm for Pattern Discovery in Large Databases [O] . Tobias Scheffer, Stefan Wrobel 2002

机译：用于大型数据库中模式发现的可扩展常量内存采样算法

A Scalable Constant-Memory Sampling Algorithm for Pattern Discovery in Large Databases

摘要

著录项

相似文献

相关主题

期刊订阅