Multiple testing in transformed space (MultiTrans)

Download

MultiTrans.zip contains the followings,

Prerequisite (MultiTrans_pylmm)
pylmmKinship.py : esitmate a kinship matrix
pylmmGWAS_multiPhHeri.py : estimate variance components of the data
Prepare input for MultiTrans
generateR.R : generate correlation
generateC.jar : change correlation into MultiTrnas format
Run MultiTrans (MultiTrans_MVN)
MultiTrans_1prep
MultiTrans_2run
MultiTrans_3sort
MultiTrans_4correct
Test data: see HowToRun.txt
License

User's guide

Steps for running MultiTrans

Prerequistic
1. Estimate a kinship matrix, K. You can esimtate a kinship matrix from genoytpes whatever software you want to use. You can use Pylmm (pylmmKinship.py) to esimtates a kinship matrix.
2. Estimate variance components sigma_g^2 and sigma_e^2 of the data. You can estimate the compoenents whatever software you want to use. You can use Pylmm (pylmmGWAS_multiPhHeri.py) to estimate the variance components.
  ♦ Pylmm is a linear mixed model solver developed in our group (for the details see Pylmm). For convenience, I added the software in the download package.
Prepare input (n is the number of individuals and m is the number of genoytpes)
1. run generateR.R
  prerequistic: gtools and mvnorm library of R
  input: genotypes (Xpath, n by m matrix), Kinship (Kpath, n by n matrix) Variance Components (VCpath, The first column contains sigma_g^2 and the second column contains sigma_e^2)
  output: r.txt
  Usage: R CMD BATCH --args -Xpath="" -Kpath="" -VCpath="" -outputPath="" generateR.R generateR.log
2. run generateC.jar
  input: r.txt (m by m correlation matrix generated from generateR.R), windowsize (1000 used in MultiTrans paper, read MultiTrans paper for the detail)
  output: c.txt
  Usage: java -jar generateC.jar windowSize r.txt c.txx
Run MultiTrans
1. MultiTrans_1prep: data pre-processing.
  Usage: ./MultiTrans_1prep [-C] [c.txt] [window size] [output:prep file]
2. MultiTrans_2run: run the actual sampling.
  Usage: ./MultiTrans_2run [prep file] [output:max stat file] [#sampling] [seed]
  10000000 sampling used in MultiTrans paper and you can set a random seem such as 12345678
3. MultiTrans_3sort: sort the maximum statistic.
  Usage: ./MultiTrans_3sort [output:sorted file] [max stat file_1] [max stat file_2] [max stat file_3]
  If you divided genomic region into N independent regions (e.g. chromosomes), provide all the max-stat-files. In that case, the number of sampling for each file has to be identical. If you did not divide the region, just use one max-stat-file, [sorted file] stores the sorted maximum statistics over the whole genome.
4. MultiTrans_4correct: correct p-values.
  Usage:./MultiTrans_4correct -p [sorted file] [pointwise-p file] [final output file]
  [pointwise-p file] is a text file containing pointwise p-values you want to correct, delimitered by space or newline. In the testData folder, there is a file names threshold.txt which you can use for the pointwise-p file.
For the details of the options running MultiTrans, see SLIDE Usage as MultiTrans s/w is developed based on the SLIDE.