The determination of feature maps, such as STSs (sequence tag sites), SNPs (single nucleotide polymorphisms) or RFLP (restric-tion fragment length polymorphisms) maps, for each chromosome copy or haplotype in an individual has important potential applications to ge-netics, clinical biology and association studies. Wo consider the problem of reconstructing two haplotypes of a diploid individual from genotype data generated by mapping experiments, and present an algorithm to i-ecover haplotypes. The problem of optimizing existing methods of SNP jpliasing with a population of diploid genotypes has been investigated in [V] and found to be NP-hard. In contrast, using single molecule methods, we show that although haplotypes are not known and data are further confounded by the mapping error model, reasonable assumptions on the mapping process allow us to recover the co-associations of allele types across consecutive loci and estimate the haplotypes with an efficient al-gorithm. The haplotype reconstruction algorithm requires two stages: Stage I is the detection of polymorphic marker types, this is clone by ixiodifying an EM-algorithm for Gaussian mixture models and an exam-ple is given for RFLP sizing. Stage II focuses on the problem of phasing and presents a method of local maximum likelihood for the inference of laaplotypes in an individual. The algorithm presented is nearly linear in ttie number of polymorphic loci. The algorithm results, run on simulated R.FLP sizing data, are encouraging, and suggest that the method will prove practical for haplotype phasing.
展开▼