The genome-wide association study (GWAS) is a widely used method for locating genomic regions that are associated with complex disease traits. In a GWAS, single nucleotide polymorphisms (SNPs) are collected across the genome and a statistical test is performed to identify significant associations, which may provide insight into the genetic basis of disease. In expression quantitative trait loci (eQTL) studies, tens of thousands of gene expression levels are measured and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed-models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the SNPs. In the first-stage a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions which may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to state of the art testing approaches by a factor of 75.
The full details are described in Efficiently Identifying Significant Associations in Genome-wide Association Studies.