Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene association. However, they are limited by the cost of genotyping the tremendous number of SNPs. It is therefore essential to select only informative subsets (tag SNPs) out of all SNPs. Several promising methods for tag SNP selection have been proposed, such as the haplotype block-based and block-free approaches. The block-free methods are preferred by some researchers because most of the block-based methods rely on strong assumptions, such as prior block-partitioning, bi-allelic SNPs, or a fixed number or locations for tagging SNPs. We employed the feature selection idea of binary particle swarm optimization (binary PSO) to find informative tag SNPs. This method is very efficient, as it does not rely on block partitioning of the genomic region. Using four public data sets, the method consistently identified tag SNPs with considerably better prediction ability than STAMPA. Moreover, this method retains its performance even when a very small number and 100% prediction accuracy are used for the tag SNPs.
展开▼