Instruction step 3: Run MetaTissueMM (Mixed Model) to obtain estimates of effects

The third step in Meta-Tissue is to run software called MetaTissueMM to compute estimates of effects and their standard errors. These estimates are required for meta-analysis we will perform in step 4. MetaTissueMM computes the estimates by taking into account the fact that multiple tissues are collected from the same individual.


MetaTissueMM is a linux program and cannot be executed in Windows or Mac. Here is its usage. Please note that since v0.3, all options are changed to long option

MetaTissueMM [options] [Required options] --expr [gene expression file] : File specified with -p option in Step 2 --geno [genotype file] : File specified with -q option in Step 2 --matrix [tissue sharing matrix file] : File specified with -r option in Step 2 --output [output prefix] : Prefix for output files (more info below) --metatissue_bin_path [Metasoft dir] : Path to Metasoft folder (more info below) [Optional argument] --cov [covariate file] : covariate file (file specified with -s in Step 2) --dosage : turn on dosage mode --help : Print help message --java_path [java path] : Path to java program (default: java) --bash_path [bash path] : Path to bash (default: /bin/bash) --tbtonly : Perform Tissue-By-Tissue only and skip Meta-Tissue --old_random_effects : Use previous random effects (RE) model discussed in MetaTissue paper (default: not set) --no_mvalue : Metasoft does not compute m-value for each tissue (default: not set) --start_snp_index [index] : 0-based start index [inclusive] (default: 0) --end_snp_index [index] : 0-based end index [exclusive] (default: # of SNPs) --n_digits [# of digits] : # of digits to print in output file (default: 7) This option may change size of output files dramatically --cisonly [cis length] : Perform only cis-analysis whose region is defined by length. --cisonly 1000000 performs cis-analysis between a SNP and probes whose positions fall within [-1Mb,+1Mb] of SNP position (default: not set) --savebeta : Do not delete beta and correlation files from MetaTissueMM (default: not set) Setting this option increases size of output files dramatically --heuristic : Enables heurstic algorithm to scale standard error. See Text S2 in paper (default: not set) Enabling heuristic increases power, but may observe a little inflation at modest/significant p-values --full_metasoft_output : Metasoft outputs all of its statistics. Setting option may increase the output file size significantly (default: not set)


Here is more information on above parameters

--start_snp_index [index] --end_snp_index [index]

  When running MetaTissueMM on a large GWAS dataset (more than a thousand samples in all tissues with more than 500K SNPs), it is recommended that you parallelize the analysis using the cluster of nodes. Users can achieve this by running MetaTissueMM on a subset of SNPs (but for all gene expression probes), and "--start_snp_index" and "--end_snp_index" options specify which SNPs to analyze. The index starts with 0 (zero), and "--start_snp_index" option is inclusive while "--end_snp_index" option is exclusive. This means that if users want to run MetaTissueMM on the first 1,000 SNPs, they need to specify "--start_snp_index 0 --end_snp_index 1000" and for the next 1,000 SNPs, they specify "--start_snp_index 1000 --end_snp_index 2000" and so on.

--output "[output prefix]"

  MetaTissueMM generates 6 files as output.

1. [output prefix].SNP.[start_SNP_index].mm.beta.std.txt.gz

This is input file to Metasoft. It contains estimates of effects and their standard errors for pairs of SNP and gene expression. Each line specifies each such pair in all tissues.

2. [output prefix].SNP.[start_SNP_index].metasoft.sh

This is a shell script that runs Metasoft automatically with the above input file. This will be discussed more in Step 4.

3. [output prefix].SNP.[start_SNP_index].tbt.ps.txt.gz

This is output of "Tissue-By-Tissue" approach that computes a p-value for each tissue separately. Each line is each pair of SNP and gene expression and there are (T+1) columns where there are p-values for T tissues (+ ID in the first column). The order of p-values is the same as the order of tissues in tissue information file. Users should use this result to detect tissue-specific eQTLs (for more info, refer to our manuscript)

4. [output prefix].SNP.[start_SNP_index].mm.log

This is a log file of MetaTissueMM.

5. [output prefix].SNP.[start_SNP_index].mm.corr.txt.gz

This specifies correlation among beta.

6. [output prefix].SNP.[start_SNP_index].mm.sigmag.txt.gz

This specifies variance explained by the random effects. Only used when --heuristic is enabled

--metatissue_bin_path [Metasoft dir] --no_mvalue

  These options are related to Metasoft. [Metasoft dir] is the "Metasoft" folder in software package. Please specify the full path to this folder! When --no_mvalue is specified, Metasoft does not compute m-value for each tissue.

--dosage

  Turn on dosage mode. Genotype file contains dosage information created with -x option using MetaTissueInputGenerator.



Here is a sample command using the input files in "example/2_MetaTissue_input/" folder and using "example/3_MetaTissue_output/" as output folder in the software package.


MetaTissueMM --expr [Full path to example folder]/2_MetaTissue_input/output_gene.txt --geno [Full path to example folder]/2_MetaTissue_input/output_snp.txt --matrix [Full path to example folder]/2_MetaTissue_input/matrix.txt --output [Full path to example folder]/3_MetaTissue_output/MetaTissue --metatissue_bin_path [Full path to Metasoft folder] --java_path /usr/bin/java --cov [Full path to example folder]/2_MetaTissue_input/output_cov.txt



IMPORTANT!!! Please use the full or absolute path (e.g. /usr/home/[user_id]) rather than using relative path (e.g. "../") or tilde ("~").