首页> 外文会议>SIAM International Conference on Data Mining >Nonparametric Density Estimation: Toward Computational Tractability
【24h】

Nonparametric Density Estimation: Toward Computational Tractability

机译:非参数密度估计:朝向计算途径

获取原文

摘要

Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches to density estimation can be divided into two principal classes, parametric methods, such as Bayesian networks, and nonparametric methods such as kernel density estimation and smoothing splines. While neither choice should be universally preferred for all situations, a well-known benefit of nonparametric methods is their ability to achieve estimation optimality for ANY input distribution as more data are observed, a property that no model with a parametric assumption can have, and one of great importance in exploratory data analysis and mining where the underlying distribution is decidedly unknown. To date, however, despite a wealth of advanced underlying statistical theory, the use of nonparametric methods has been limited by their computational intractibility for all but the smallest datasets. In this paper, we present an algorithm for kernel density estimation, the chief nonparametric approach, which is dramatically faster than previous algorithmic approaches in terms of both dataset size and dimensionality. Furthermore, the algorithm provides arbitrarily tight accuracy guarantees, provides anytime convergence, works for all common kernel choices, and requires no parameter tuning. The algorithm is an instance of a new principle of algorithm design: multi-recursion, or higher-order divide-and-conquer.
机译:密度估计是几乎所有的概率学习方法核心操作(相对于辨别方法)。方法密度估计可以分为两个主要的类别,参数的方法,例如贝叶斯网络,和非参数的方法,例如核密度估计和平滑样条。虽然没有选择应普遍首选的所有情况下,非参数方法的公知的好处是它们作为多个数据中观察到,以实现估计最优对于任何输入分配能力,属性,与一个参数假设没有模型可以具有,和一个在其中底层分布是决定性的未知探索性数据分析和挖掘具有重要意义。到目前为止,然而,尽管有丰富先进的底层统计理论,运用非参数方法已被其计算intractibility为所有,但最小数据集的限制。在本文中,我们提出了核密度估计,主要非参数方法,这种方法大大加快比以前的算法方法在这两个数据集的大小和维方面的算法。此外,该算法提供任意紧精度的质量担保,提供随时随地收敛,适用于所有的通用内核的选择,无需参数整定。该算法的算法设计的新原理的实例:多递归或高阶的分而治之。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号