首页> 外文会议>Conference on Neural Information Processing Systems;Annual conference on Neural Information Processing Systems >Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
【24h】

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

机译:概率分布的RKHS嵌入的内核选择和可分类性

获取原文

摘要

Embeddings of probability measures into reproducing kernel Hilbert spaces have been proposed as a straightforward and practical means of representing and comparing probabilities. In particular, the distance between embeddings (the maximum mean discrepancy, or MMD) has several key advantages over many classical metrics on distributions, namely easy computability, fast convergence and low bias of finite sample estimates. An important requirement of the embedding RKHS is that it be characteristic: in this case, the MMD between two distributions is zero if and only if the distributions coincide. Three new results on the MMD are introduced in the present study. First, it is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification. An important consequence is that a kernel must be characteristic to guarantee classifiability between distributions in the RKHS. Second, the class of characteristic kernels is broadened to incorporate all strictly positive definite kernels: these include non-translation invariant kernels and kernels on non-compact domains. Third, a generalization of the MMD is proposed for families of kernels, as the supremum over MMDs on a class of kernels (for instance the Gaussian kernels with different bandwidths). This extension is necessary to obtain a single distance measure if a large selection or class of characteristic kernels is potentially appropriate. This generalization is reasonable, given that it corresponds to the problem of learning the kernel by minimizing the risk of the corresponding kernel classifier. The generalized MMD is shown to have consistent finite sample estimates, and its performance is demonstrated on a homogeneity testing example.
机译:已经提出将概率测度嵌入再现核希尔伯特空间中,作为表示和比较概率的一种简单而实用的方法。特别是,嵌入之间的距离(最大平均差异,即MMD)相对于许多经典的分布度量具有几个关键优势,即易于计算,快速收敛和有限样本估计的低偏差。嵌入RKHS的一个重要要求是要具有特性:在这种情况下,当且仅当分布重合时,两个分布之间的MMD才为零。在本研究中介绍了有关MMD的三个新结果。首先,确定MMD对应于核分类器的最佳风险,从而在分布之间的距离与其易于分类之间形成自然联系。一个重要的结果是内核必须具有特征性,以保证RKHS中的发行版之间的可分类性。其次,扩展了特征核的类别,以包含所有严格的正定核:这些特征包括非翻译不变核和非紧凑域上的核。第三,提出了针对内核系列的MMD的一般化,作为一类内核(例如具有不同带宽的高斯内核)上MMD的最高者。如果可能需要大量选择或分类特征核,则此扩展对于获得单个距离度量是必需的。这种归纳是合理的,因为它与通过最小化相应内核分类器的风险来学习内核的问题相对应。广义MMD被证明具有一致的有限样本估计,并且其性能在同质性测试示例中得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号