Genetic variants in the size range ∼1 kb-10 Mb altering the quantitative composition of a genome including insertions, duplications, and deletions, are known as copy-number variants (CNVs). This dissertation develops techniques addressing challenges in CNV detection and examines the population genetics of CNVs. To address challenges in integer assignment of CNV copy-number, four independent predictors of probe response are identified with potential utility in future array designs or data analyses. Additionally, an approach is developed to identify CNV breakpoints within segmental duplications (SDs), extended regions of highly identical sequence, using 17q21.31 deletions as a model locus. To examine the population genetics of CNVs, I characterized the prevalence copy-number variation and estimate its mutation rate in the general population. Among 2493 apparently normal individuals, large variants were collectively common with CNVs >500 kb observed in 5%--10% of individuals, and variants >1 Mb in 1%--2%. Conversely, correlations between the gene content, size, and frequency of CNVs suggested that such variation is generally deleterious. Underscoring the potential clinical impact of large CNVs, a meta-analysis of individuals with neuropsychiatric disease identified additional CNV loci (3q29, 16p12 and 15q25.2) for further investigation. Examining 386 trios unaffected by neuropsychiatric disease, I observe a genome-wide CNV mutation rate of mu = 1.2 x 10-2 CNVs per genome per transmission (mu = 6.5 x 10-3 for CNVs >500 kb), and infer that CNVs >500 kb are, on average, under significant purifying selection (s = 0.16). These observations suggest that large CNVs are fairly common in human populations due to a relatively high mutation rate but are constantly being removed by natural selection. Demonstrating how deleterious CNVs may manifest, identification of de novo CNVs in 3286 transmissions among 717 multiplex autism pedigrees revealed a fourfold enrichment for de novo CNVs in autism cases versus their unaffected siblings suggesting that many de novo CNVS contribute a subtle, but significant risk for autism. This work extends our ability to study CNVs in regions of the genome previously refractory to analysis, and highlights the importance of rare genetic variation in human disease.
展开▼