We show that the relevant information of a supervised learning problemis contained up to negligible error in a finite number of leadingkernel PCA components if the kernel matches the underlying learningproblem in the sense that it can asymptotically represent the functionto be learned and is sufficiently smooth. Thus, kernels do not onlytransform data sets such that good generalization can be achievedusing only linear discriminant functions, but this transformation isalso performed in a manner which makes economical use of feature spacedimensions. In the best case, kernels provide efficient implicitrepresentations of the data for supervised learning problems.Practically, we propose an algorithm which enables us to recover thenumber of leading kernel PCA components relevant for goodclassification. Our algorithm can therefore be applied (1) to analyzethe interplay of data set and kernel in a geometric fashion, (2) toaid in model selection, and (3) to denoise in feature space in orderto yield better classification results. color="gray">
展开▼