PhenIMP (Phenotype Imputation)

Imputing hard-to-collect phenotype for GWAS using correlation structure to easier-to-access proxy phenotypes

Download

The R code includes the following three R functions.

phen.imp : Impute phenotypes.
linreg.meta : After phenotype imputation, perform simple linear regression and perform the optimal meta-analysis of observed phenotypes and imputed phenotypes.
phen.imp.summary : Perform weighted combination of summary statistics, if only summary statistics are available.

See below about how to use each function.

phen.imp

Input:

Y1 is mxp phenotype matrix of m individuals and p phenotypes, with no missing data.
Y2 is nxp phenotype matrix of n individuals and p phenotypes, with missing “NA” at only index imp.index
imp.index is the phenotype index to be imputed.

Output:

Full Y2.imputed with imputed phenotype
“imputed” is the imputed phenotype vector
Rmat is the phenotypic correlation matrix
Rvec is the correlation vector between imp.index and others
Sigma.inv is the inverse of the submatrix of the correlation matrix excluding imp.index
r.imp is the correlation necessary for meta-analysis and power calculation

linreg.meta

Input:

Y1 is mxp phenotype matrix of m individuals and p phenotypes, with no missing data.
Y2.imputed is nxp phenotype matrix of n individuals and p phenotypes, with imputed phenotype at imp.index
imp.index is the phenotype index that was imputed.
r.imp is imputation accuracy r, which is from the output of phen.imp function
X1 is mxg genotype matrix of m individuals at g SNPs.
covar1 is mxk covariates matrix of m individuals and k covariates.
X2 is nxg genotype matrix of n individuals at g SNPs.
covar2 is nxk covariates matrix of n individuals and k covariates.

Output:

phen.imp.summary

Input:

zscore1 is a z-score from dataset1, where phenotype is collected.
m is the number of individuals in dataset1.
zscores2 is z-scores from dataset2, where phenotype is not collected. These z-scores are for (p-1) phenotypes, excluding the missing phenotype.
n is the number of individuals in dataset2.
Sigma.inv is the inverse of the submatrix of the phenotypic correlation matrix, excluding the missing phenotype. (estimated from dataset1)
Rvec is the correlation vector between to-be-imputed phenotype and other phenotypes. (estimated from dataset2)

Output:

Future Plan:

We also plan to implement some of our approaches to METASOFT, a widely used tool for meta-analysis of GWAS.

Hormozdiari et al., “Imputing phenotypes for genome-wide association studies”. Under review.

Farhad Hormozdiari: farhad.hormozdiari (AT) gmail.com

Buhm Han : buhan (AT) amc.seoul.kr