Given samples from an unknown distribution p and a description of a distribution q, are p and q close or far? This question of "identity testing" has received significant attention in the case of testing whether p and q are equal or far in total variation distance. However, in recent work [VV11a, ADK15, DP17], the following questions have been been critical to solving problems at the frontiers of distribution testing: 1. Alternative Distances: Can we test whether p and q are far in other distances, say Hellinger? 2. Tolerance: Can we test when p and q are close, rather than equal? And if so, close in which distances? Motivated by these questions, we characterize the complexity of distribution testing under a variety of distances, including total variation, l_2, Hellinger, Kullback-Leibler, and χ~2. For each pair of distances d_1 and d_2, we study the complexity of testing if p and q are close in d_1 versus far in d_2, with a focus on identifying which problems allow strongly sublinear testers (i.e., those with complexity O(n~(1-γ)) for some γ > 0 where n is the size of the support of the distributions p and q). We provide matching upper and lower bounds for each case. We also study these questions in the case where we only have samples from q (equivalence testing), showing qualitative differences from identity testing in terms of when tolerance can be achieved. Our algorithms fall into the classical paradigm of χ~2-statistics, but require crucial changes to handle the challenges introduced by each distance we consider. Finally, we survey other recent results in an attempt to serve as a reference for the complexity of various distribution testing problems.
展开▼