Mining Complex Models from Arbitrarily Large Databases in Constant Time

机译：在恒定时间内从任意大型数据库中挖掘复杂模型

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the database, while the decisions made are essentially identical to those that would be made given infinite data. The method works within pre-specified memory limits and, as long as the data is iid, only requires accessing it sequentially. It gives anytime results, and can be used to produce batch, stream, time-changing and active-learning versions of an algorithm. We apply the method to learning Bayesian networks, developing an algorithm that is faster than previous ones by orders of magnitude, while achieving essentially the same predictive performance. We observe these gains on a series of large databases generated from benchmark networks, on the KDD Cup 2000 e-commerce data, and on a Web log containing 100 million requests.

机译：在本文中，我们提出了一种放大方法，该方法基本上适用于任何基于离散搜索的归纳算法。将该方法应用于算法的结果是，其运行时间变得与数据库的大小无关，而所做出的决定与给定无限数据时所做出的决定本质上相同。该方法在预先指定的内存限制内工作，并且只要对数据进行识别，只需要顺序访问即可。它可以随时提供结果，并可用于生成算法的批处理，流，时变和主动学习版本。我们将该方法应用于学习贝叶斯网络，开发了一种比以前的算法快几个数量级的算法，同时实现了基本相同的预测性能。我们从基准网络生成的一系列大型数据库，KDD Cup 2000电子商务数据以及包含1亿个请求的Web日志中观察到了这些收益。

著录项

来源
《Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 23-26, 2002, Edmonton》|2002年|p.525-531|共7页
会议地点
作者
Geoff Hulten; Pedro Domingos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词
scalable learning algorithms; subsampling; hoeffding bounds; discrete search; bayesian networks;

机译：可扩展的学习算法;二次抽样霍夫边界离散搜索贝叶斯网络;

相似文献

外文文献
中文文献
专利

1. Real-time solution of linear computational problems using databases of parametric reduced-order models with arbitrary underlying meshes [J] . Amsallem David, Tezaur Radek, Farhat Charbel Journal of Computational Physics . 2016,第Null期

机译：使用带有任意基础网格的参数化降阶模型数据库实时求解线性计算问题
2. Noncritical osp (1 vertical bar 2, R) M-theory matrix model with an arbitrary time-dependent cosmological constant [J] . Park JH Nuclear physics, B . 2006,第3期

机译：具有任意随时间变化的宇宙常数的非临界osp（1条竖线2，R）M理论矩阵模型
3. Pancreatic Expression database: a generic model for the organization, integration and mining of complex cancer datasets [J] . Claude Chelala, Stephan A Hahn, Hannah J Whiteman, BMC Genomics . 2007,第1期

机译：胰腺表达数据库：用于组织，整合和挖掘复杂癌症数据集的通用模型
4. Mining complex models from arbitrarily large databases in constant time [C] . Geoff Hulten, Pedro Domingos Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining(KDD-2000) . 2002

机译：恒定时间从任意大型数据库中挖掘复杂模型
5. Multimedia data mining and retrieval for multimedia databases using associations and correlations. [D] . Lin, Lin. 2010

机译：使用关联和相关性对多媒体数据库进行多媒体数据挖掘和检索。
6. An Automated DICOM Database Capable of Arbitrary Data Mining (Including Radiation Dose Indicators) for Quality Monitoring [O] . Shanshan Wang, William Pavlicek, Catherine C. Roberts, 2011

机译：能够进行任意数据挖掘（包括辐射剂量指示器）的自动化DICOM数据库用于质量监控
7. Mining Complex Models from Arbitrarily Large Databases in Constant Time [O] . Geoff Hulten, Pedro Domingos 2002

机译：从恒定时间的任意大型数据库中挖掘复杂模型
8. Response-Time Distribution in a Real-Time Database with Optimistic Concurrency211 Control and Constant Execution Times [R] . Sassen, S. A. E., van der Wal, J. 1997

机译：具有乐观并发211控制和恒定执行时间的实时数据库中的响应时间分布

Mining Complex Models from Arbitrarily Large Databases in Constant Time

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅