首页> 外文期刊>Machine Learning >On some graph-based two-sample tests for high dimension, low sample size data
【24h】

On some graph-based two-sample tests for high dimension, low sample size data

机译:在一些基于图形的两样本测试中,用于处理高维,低样本量的数据

获取原文
获取原文并翻译 | 示例
           

摘要

Testing for equality of two high-dimensional distributions is a challenging problem, and this becomes even more challenging when the sample size is small. Over the last few decades, several graph-based two-sample tests have been proposed in the literature, which can be used for data of arbitrary dimensions. Most of these test statistics are computed using pairwise Euclidean distances among the observations. But, due to concentration of pairwise Euclidean distances, these tests have poor performance in many high-dimensional problems. Some of them can have powers even below the nominal level when the scale-difference between two distributions dominates the location-difference. To overcome these limitations, we introduce some new dissimilarity indices and use them to modify some popular graph-based tests. These modified tests use the distance concentration phenomenon to their advantage, and as a result, they outperform the corresponding tests based on the Euclidean distance in a wide variety of examples. We establish the high-dimensional consistency of these modified tests under fairly general conditions. Analyzing several simulated as well as real data sets, we demonstrate their usefulness in high dimension, low sample size situations.
机译:测试两个高维分布的相等性是一个具有挑战性的问题,当样本量较小时,这将变得更具挑战性。在过去的几十年中,文献中提出了几种基于图形的两样本测试,可用于任意维度的数据。这些测试统计中的大多数是使用观察值之间的成对欧几里德距离来计算的。但是,由于成对的欧几里得距离的集中,这些测试在许多高维问题中的性能较差。当两个分布之间的比例差异主导位置差异时,其中一些甚至可以具有低于标称水平的功效。为了克服这些限制,我们引入了一些新的相异性索引,并使用它们来修改一些流行的基于图的测试。这些修改后的测试充分利用了距离集中现象,因此,在许多示例中,它们都优于基于欧几里得距离的相应测试。我们在相当一般的条件下建立了这些修改后的测试的高维一致性。通过分析几个模拟以及真实数据集,我们证明了它们在高维,低样本量情况下的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号