We study the problem of detecting outlier pairs of strongly correlated variables among a collection of n variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of n vectors with unit Euclidean norm and dimension d, and we are asked to find all the outlier pairs of vectors whose inner product is at least ρ in absolute value, subject to the promise that all but at most q pairs of vectors have inner product at most τ in absolute value for some constants 0 < τ < ρ < 1. Improving on an algorithm of G. Valiant [FOCS 2012; J.ACM 2015], we present a randomized algorithm that for Boolean inputs ({-1, 1}-valued data normalized to unit Euclidean length) runs in time O(n~(max {1-γ+M(Δγ,γ), M(1-γ,2Δγ)}) + qdn~(2γ)), where 0 < γ < 1 is a constant tradeoff parameter and M(μ, ν) is the exponent to multiply an [n~μ] × [n~ν] matrix with an [n~ν] × [n~μ] matrix and Δ = 1/(1 - log_τ ρ). As corollaries we obtain randomized algorithms that run in time O(n(2ω)/(3-log_τ ρ) + qdn(2(1-log_τ ρ))/(3-log_τ ρ)) and in time O(n4/(2+α(1 - log_τ ρ)) + qdn(2α(1 - log_τ ρ))/(2+α(1 - log_τ ρ))), where 2 ≤ ω < 2.38 is the exponent for square matrix multiplication and 0.3 < α ≤ 1 is the exponent for rectangular matrix multiplication. We present further corollaries for the light bulb problem and for learning sparse Boolean functions. (The notation O (·) hides polylogarithmic factors in n and d whose degree may depend on ρ and τ.)
展开▼