The k Nearest Neighbor classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, the performance of k-NN depends strongly on the distance considered to evaluate the sample proximities. Besides, the choice of a good dissimilarity is a difficult task and depends on the problem at hand.In this paper, we learn a linear combination of dissimilarities using a regularized version of the kernel alignment algorithm. The error function can be optimized using a semi-definite programming approach and incorporates a term that penalizes the complexity of the family of distances avoiding overfitting.The method proposed has been applied to the challenging problem of cancer identification using the gene expression profiles. Kernel alignment k-NN outperforms other metric learning strategies and improves the classical k-NN based on a single dissimilarity.
展开▼