Mixed Model Coexpression (MMC)

How do I install this code?
Its very easy. Download the code, unzip it and import into R with the source function. Here is an example.
# tar -xvzf mmc.beta.01.tar.gz # cd mmc.beta.01 # R > # in R now > source('R/mmc.R')
Is there a simple example and some sample data to show me how to use this code?
Of course. In R do the following.
> source('R/mmc.R') > exprs = as.matrix(read.table('exprs.tab')) > exprs.ncov = mmc.ncov(exprs) # get normalized covariance matrix > exprs.cor = mmc.cor(exprs,exprs.ncov) # an nrow(exprs) x nrow(exprs) correlation matrix
What is the difference between mmc.cor and mmc.corsym
The details are highlighted in the paper in section 2.2. mmc.cor is the implementation as described in the bulk of the section. mmc.corsym is the version mentioned in the last paragraph of that section. The basic idea is that in mmc.cor coexpression is calculated by fitting a linear mixed model where one gene is the response and the other is the predictor. The assumption is that the estimated variance components will be the same no matter which gene is the response and which is the predictor. In many cases, this is correct. However, this assumption might be violated and when it is, the calculated coexpression might be incorect. In this case, you should use mmc.corsym, which fits the model both ways to and corrects each gene indendently. The drawback is that mmc.corsym runs much slower.
What is the faster option in mmc.cor?
If faster=TRUE, then the function uses a quicker method to calculate the coexpresion values. In practice the absolute values of the coexpressions obtained using either method are very close (less than 1e-10 difference). The drawback is that the fast method does not have direction (ie. all coexpressions are positive). A good strategy may be to calculate all paiwise coexpressions with the quick method and then select some interesting ones to apply the slower method to, in order to get direction. The same thing could be said for the symmetric function.
Can I add other covariates to the calculation of coexpresion (eg. age, sex, etc.)?
In theory: yes. Unfortunately, I ran into some strange issues when implementing this part, so for the time being that feature is not available.
How stable is this code?
Its fairly stable, but still in beta. We used a different version for the paper and I am actively working on the R version in order to improve it. Please forward and bugs to Nick Furlotte (email below).
What about population structure?
There are two issues with population structure which might be problematic. The first is when the expression data is under the influence of expression heterogeneity as well as population structure. In this case, MMC can only correct for one. There has been recent work addressing this issue for eQTL, but thus far no one has attempted to address the problem for coexpression. We are thinking about that.
The other population structure issue arises when there is no expression heterogenity but only population structure. For example, if you substitue expression traits for metabolic traits, you no longer expect confounding due to expression heterogeneity but you might for population structure. In this case, the correlation between metabolic traits can be calculated while correcting for population structure with MMC. Its as simple as substituting exprs.ncov (from above) with a matrix representing the population structure. For this purpose we have included a function from the EMMA package called emma.kinship. This function takes a matrix of SNPs, where each row is a SNP and each column the coded genotype (0, 0.5 or 1) for N individuals. The function returns an NxN kinship matrix. Here is an example. Assume we have an mXN matrix phenos which has m phenotypes for N individuals. We also have a MxN matrix of SNPs.
> K = emma.kinship(SNPs) > phenos.cor = mmc.cor(phenos,K)

We have not extensively tested this particular use of the program.

Contact: Nick Furlotte (nfurlott at cs dot ucla dot edu)