首页> 外文期刊>電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 >A Bayesian Estimator of Mutual Information and its Application to Tests of Independence
【24h】

A Bayesian Estimator of Mutual Information and its Application to Tests of Independence

机译:相互信息的贝叶斯估计及其在独立性测试中的应用

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we proposed a Bayesian estimator of mutual information and showed that it performs satisfactorily to solve the independence testing problem. The proposed estimator is strongly consistent even for continuous variables. Our algorithm generates a finite number of quantizations and computes the probability of the discrete sequence obtained by each quantizer. Although the number K of quantizations needs to be finite, we found the setting K = logn to be a reasonable estimation that worked well in our experiments. The proposed algorithm requires O(nlogn) computation time and O(n) memory. We compared the performance of our proposed estimator with that of the HSIC, the currently preferred independence testing principle. The idea underlying the proposed method is based on maximizing the posterior probability given the prior probability p and data x~n, y~n, although the HSIC detects abnormality assuming the null hypothesis and the given data. While we could not determine which method is superior in a general setting, we did discover that the HSIC only considers the magnitude of the data, which causes it to miss underlying relations that are not detectable by merely observing changes in magnitude. The greatest merit of our proposed algorithm compared to the HSIC is its efficiency. For the HSIC, O(n~3) computation is required for a test. Prior to testing, we need to simulate the null hypothesis and set a threshold such that the algorithm decides that the data are independent if and only if the HSIC values are less than the threshold. In this sense, executing the HSIC is time consuming. In future work, we intend to identify the exact border that is preferable with regard to the accuracy of independence testing. We also intend to publish the R program as a package.
机译:在本文中,我们提出了互信息的贝叶斯估计器,并证明了该算法在解决独立性测试问题上的表现令人满意。所提出的估计量即使对于连续变量也具有很强的一致性。我们的算法生成有限数量的量化,并计算每个量化器获得的离散序列的概率。尽管量化的数量K必须是有限的,但我们发现设置K = logn是一个合理的估计,在我们的实验中效果很好。所提出的算法需要O(nlogn)计算时间和O(n)内存。我们将建议的估算器的性能与目前首选的独立性测试原理HSIC的性能进行了比较。尽管HSIC在假设零假设和给定数据的情况下检测到异常,但该方法所基于的思想是基于最大化给定先验概率p和数据x〜n,y〜n的后验概率。虽然我们无法确定哪种方法在一般情况下是更好的方法,但我们确实发现HSIC仅考虑数据的大小,这导致它错过了仅通过观察大小变化无法检测到的潜在关系。与HSIC相比,我们提出的算法的最大优点是效率高。对于HSIC,测试需要O(n〜3)计算。在测试之前,我们需要模拟无效假设并设置一个阈值,以便当且仅当HSIC值小于阈值时,算法才能确定数据是独立的。从这个意义上讲,执行HSIC是很耗时的。在将来的工作中,我们打算确定在独立性测试的准确性方面更可取的确切边界。我们还打算将R程序打包发布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号