PhenIMP (Phenotype Imputation)

Imputing hard-to-collect phenotype for GWAS using correlation structure to easier-to-access proxy phenotypes

Download

Version/bug info

  • v0.1.0 (2016-03-17) Initial prototype R code deployed

User's guide

The R code includes the following three R functions.

  1. phen.imp : Impute phenotypes.

  2. linreg.meta : After phenotype imputation, perform simple linear regression and perform the optimal meta-analysis of observed phenotypes and imputed phenotypes.

  3. phen.imp.summary : Perform weighted combination of summary statistics, if only summary statistics are available.

See below about how to use each function.

phen.imp

Input:

  • Y1 is mxp phenotype matrix of m individuals and p phenotypes, with no missing data.

  • Y2 is nxp phenotype matrix of n individuals and p phenotypes, with missing “NA” at only index imp.index

  • imp.index is the phenotype index to be imputed.

Output:

  • Full Y2.imputed with imputed phenotype

  • “imputed” is the imputed phenotype vector

  • Rmat is the phenotypic correlation matrix

  • Rvec is the correlation vector between imp.index and others

  • Sigma.inv is the inverse of the submatrix of the correlation matrix excluding imp.index

  • r.imp is the correlation necessary for meta-analysis and power calculation

linreg.meta

Input:

  • Y1 is mxp phenotype matrix of m individuals and p phenotypes, with no missing data.

  • Y2.imputed is nxp phenotype matrix of n individuals and p phenotypes, with imputed phenotype at imp.index

  • imp.index is the phenotype index that was imputed.

  • r.imp is imputation accuracy r, which is from the output of phen.imp function

  • X1 is mxg genotype matrix of m individuals at g SNPs.

  • covar1 is mxk covariates matrix of m individuals and k covariates.

  • X2 is nxg genotype matrix of n individuals at g SNPs.

  • covar2 is nxk covariates matrix of n individuals and k covariates.

Output:

  • a matrix with z-scores and p-values

phen.imp.summary

Input:

  • zscore1 is a z-score from dataset1, where phenotype is collected.

  • m is the number of individuals in dataset1.

  • zscores2 is z-scores from dataset2, where phenotype is not collected. These z-scores are for (p-1) phenotypes, excluding the missing phenotype.

  • n is the number of individuals in dataset2.

  • Sigma.inv is the inverse of the submatrix of the phenotypic correlation matrix, excluding the missing phenotype. (estimated from dataset1)

  • Rvec is the correlation vector between to-be-imputed phenotype and other phenotypes. (estimated from dataset2)

Output:

  • z-score and p-value of the weighte combination of statistics approach

Future Plan:

We also plan to implement some of our approaches to METASOFT, a widely used tool for meta-analysis of GWAS.

Publication

Hormozdiari et al., “Imputing phenotypes for genome-wide association studies”. Under review.

Contact

Farhad Hormozdiari: farhad.hormozdiari (AT) gmail.com

Buhm Han : buhan (AT) amc.seoul.kr