We study the problem of finding an outlier-free subset of a set of points (or a probability distribution) in n-dimensional Euclidean space. As in [BFKV 99], a point x is defined to be a beta-outlier if there exists some direction it; in which its squared distance from the mean along w is greater than beta times the average squared distance from the mean along w. Our main theorem is that for any epsilon > 0, there exists a (1 - epsilon) fraction of the original distribution that has no O((n)/(epsilon)(b + log(n)/(epsilon)))-outliers, improving on the previous bound of 0(n(7) b/e). This is asymptotically the best possible, as shown by a matching lower bound. The theorem is constructive, and results in a (1)/(1-epsilon) approximation to the following optimization problem: given a distribution mu(i.e. the ability to sample from it), and a parameter epsilon > 0, find the minimum beta for which there exists a subset of probability at least (1 - epsilon) with no beta-outliers. (C) 2003 Elsevier Inc. All rights reserved. [References: 8]
展开▼