When data mining techniques are applied to uncertain data, their uncertainty has to be considered to obtain high quality results. Usually, an uncertain object is described by a probability density function, a probability density function is approximated by a large amount of sample points, and the distance between two uncertain objects is expressed by the expected distance. Computing the expected distance is costly because it involves double integral using a large amount of sample points for two uncertain objects'' probability density functions. This is critical for some uncertain data mining techniques. In this paper, a simple and efficient formula of evaluating the distance between two uncertain objects is presented. We also give the application of the formula in nearest-neighbor classifying. Experiments with datasets based on UCI datasets and the plant dataset of “Three Parallel Rivers of Yunnan Protected Area” verify the formula is effective and efficient.
展开▼