首页> 外文会议>Annual conference on Neural Information Processing Systems >Optimal kernel choice for large-scale two-sample tests

【24h】

Optimal kernel choice for large-scale two-sample tests

机译：大规模两样本测试的最佳内核选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

机译：给定来自分布p和q的样本，两次样本检验根据测量样本之间距离的检验统计值确定是否拒绝p = q的零假设。测试统计量的一种选择是最大平均差异（MMD），它是再现内核Hilbert空间中概率分布的嵌入之间的距离。获取这些嵌入所使用的内核对于确保测试具有高功效至关重要，并且可以以高概率正确地区分不同的分布。提出了一种基于MMD的二样本测试参数选择方法。对于给定的测试级别（发生I型错误的可能性的上限），选择内核以最大化测试能力，并降低发生II型错误的可能性。测试统计量，测试阈值以及对内核参数的优化是在样本量中以线性成本获得的。这些属性使内核选择和测试过程适合于数据流，在这些数据流中，观察值无法全部存储在内存中。在实验中，新的内核选择方法比早期的内核选择启发式方法产生了更强大的测试。

著录项

来源
《Annual conference on Neural Information Processing Systems 》|2012年|1205-1213|共9页
会议地点
作者
Arthur Gretton; Bharath Sriperumbudur; Dino Sejdinovic; Heiko Strathmann; Sivaraman Balakrishnan; Massimiliano Pontil; Kenji Fukumizu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Asymptotically Optimal One- and Two-Sample Testing With Kernels [J] . Zhu Shengyu, Chen Biao, Chen Zhitang, IEEE Transactions on Information Theory . 2021 ,第4期

机译：渐近最佳的核心和两种样本测试
2. Testing-optimal kernel choice in HAR inference [J] . Sun Yixiao, Yang Jingjing Journal of Econometrics . 2020 ,第1期

机译：测试 - 在HAR推断中最佳核心选择
3. ASYMPTOTIC EFFICIENCY AND LOCAL OPTIMALITY OF TESTS BASED ON TWO-SAMPLE U- AND V-STATISTICS [J] . V. V. Litvinova, Ya. Yu. Nikitin Journal of Mathematical Sciences . 2008 ,第6期

机译：基于两个样本U和V统计量的测试的渐近效率和局部最优性
4. Optimal kernel choice for large-scale two-sample tests [C] . Arthur Gretton, Bharath Sriperumbudur, Dino Sejdinovic, Annual conference on Neural Information Processing Systems . 2012

机译：大型两样本测试的最佳核选择
5. On the Construction of Minimax Optimal Nonparametric Tests with Kernel Embedding Methods [D] . Li, Tong. 2021

机译：基于核心嵌入方法的最低限度最佳非参数测试的构建
6. Ultra-Broadband Lithography-Free and Large-Scale Compatible Perfect Absorbers: The Optimum Choice of Metal layers in Metal-Insulator Multilayer Stacks [O] . Sina Abedini Dereshgi, Amir Ghobadi, Hodjat Hajian, -1

机译：超宽带无光刻和大规模兼容的完美吸收体：金属绝缘体多层堆叠中金属层的最佳选择
7. B-tests: Low Variance Kernel Two-Sample Tests [O] . Zaremba, Wojciech, Gretton, Arthur, Blaschko, Matthew 2014

机译：B检验：低方差核双样本检验

Optimal kernel choice for large-scale two-sample tests

摘要

著录项

相似文献

相关主题

期刊订阅