A key question in human genetics is understanding the proportion of SNPs modulating a particular phenotype or the proportion of susceptibility SNPs for a disease, termed polygenicity. Previous studies have observed that complex traits tend to be highly polygenic, opposing the previous belief that only a handful of SNPs contribute to a trait [1-4]. Beyond these genome-wide estimates, the distribution of polygenicity across genomic regions as well as the genomic factors that affect regional polygenicity remain poorly understood. A reason for this gap is that methods for estimating polygenicity utilize SNP effect sizes from GWAS. However, due to LD and noise from the regression performed in GWAS, all effect sizes estimated from GWAS are non-zero, but not every SNP is truly a susceptibility SNP. Estimating polygenicity from GWAS while accounting for LD requires fully conditioning on the "susceptibility status" of every SNP and explicitly enumerating all possible configurations of susceptibility SNPs. This creates an exponential search space of 2~M, where M is the number of SNPs, which is intractable even when analyses are within small regions in the genome.
展开▼