The increased collection of high-dimensional data in various fields has raised a stronginterest in clustering algorithms and variable selection procedures. In this disserta-tion, I propose a model-based method that addresses the two problems simultane-ously. I use Dirichlet process mixture models to define the cluster structure and tointroduce in the model a latent binary vector to identify discriminating variables. Iupdate the variable selection index using a Metropolis algorithm and obtain inferenceon the cluster structure via a split-merge Markov chain Monte Carlo technique. Ievaluate the method on simulated data and illustrate an application with a DNAmicroarray study. I also show that the methodology can be adapted to the problemof clustering functional high-dimensional data. There I employ wavelet thresholdingmethods in order to reduce the dimension of the data and to remove noise from theobserved curves. I then apply variable selection and sample clustering methods in thewavelet domain. Thus my methodology is wavelet-based and aims at clustering thecurves while identifying wavelet coefficients describing discriminating local features.I exemplify the method on high-dimensional and high-frequency tidal volume tracesmeasured under an induced panic attack model in normal humans.
展开▼