For analyzing complex trait association with sequencing data, most current studies test aggregated effects of variants in a gene or genomic region. While gene-based tests have insufficient power even for moderately sized samples, pathway-based analyses combine information across multiple genes in biological pathways and may offer additional insight. However, most existing pathway association methods are originally designed for genome-wide association studies (GWAS), and are not comprehensively evaluated for sequencing data. Moreover, region-based rare variant association methods, although potentially applicable to pathway-based analysis by extending their region definition to gene sets, have never been rigorously tested.In the context of exome-based studies, we use simulated and real data sets to evaluate pathway-based association tests. Our simulation strategy adopts a genome-wide genetic model that distributes total genetic effects hierarchically into pathways, genes, and individual variants, allowing the evaluation of pathway-based methods with realistic quantifiable assumptions on the underlying genetic architectures.The results show that, while no single pathway-based association method offers superior performance in all simulated scenarios, a modification of GSEA approach using statistics from single-marker tests without gene-level collapsing (WKS-Variant method) is consistently powerful. Interestingly, directly applying rare variant association tests (e.g., SKAT) to pathway analysis offers a similar power, but its results are sensitive to assumptions of genetic architecture. We applied pathway association analysis to an exome sequencing data of the chronic obstructive pulmonary disease (COPD), and found that the WKS-Variant method confirms associated genes previously published.
展开▼