首页> 外文会议>Conference on Neural Information Processing Systems >Comparing distributions: l_1 geometry improves kernel two-sample testing
【24h】

Comparing distributions: l_1 geometry improves kernel two-sample testing

机译:比较分布:L_1几何改善内核二样本测试

获取原文

摘要

Are two sets of observations drawn from the same distribution? This problem is a two-sample test. Kernel methods lead to many appealing properties. Indeed state-of-the-art approaches use the L~2 distance between kernel-based distribution representatives to derive their test statistics. Here, we show that L~p distances (with p ≥ 1) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence. Moreover, for analytic kernels, we show that the L~1 geometry gives improved testing power for scalable computational procedures. Specifically, we derive a finite dimensional approximation of the metric given as the l_1 norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to maximize the differences of the distributions and give interpretable indications of how they differs. Using an l_1 norm gives better detection because differences between representatives are dense as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while much faster than state-of-the-art quadratic-time kernel-based tests. Experiments on artificial and real-world problems demonstrate improved power/time tradeoff than the state of the art, based on l_2 norms, and in some cases, better outright power than even the most expensive quadratic-time tests.
机译:来自同一分布的两套观察结果?这个问题是两个样本测试。内核方法导致许多吸引人的属性。实际上,最先进的方法使用基于内核的分布代表之间的L〜2距离来得出他们的测试统计数据。在这里,我们表明,这些分布代表之间的L〜P距离(P≥1)为分布空间提供指标,以良好地检测分布之间的差异,因为它们为弱收敛而定。此外,对于分析内核,我们表明L〜1几何形状可提供可扩展计算程序的改进的测试电力。具体地,我们从给定的度量标准的有限尺寸近似,作为捕获在空间位置或频率(即,特征)处评估的分析函数的期望差异的差异。可以选择该特征以最大化分布的差异,并给出他们与其不同的可解释迹象。使用L_1 Norm提供更好的检测,因为代表之间的差异是密集的,因为我们使用分析内核(几乎无处不在地)。测试是一致的,而不是最先进的二次时间内核的测试。人工和现实世界问题的实验表明,基于L_2规范,以及在某些情况下,在某些情况下,比最昂贵的二次时间测试甚至更好地彻底的电力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号