Download

MultiTrans.zip contains the followings,

User's guide

Steps for running MultiTrans
  1. Prerequistic
    1. Estimate a kinship matrix, K. You can esimtate a kinship matrix from genoytpes whatever software you want to use. You can use Pylmm (pylmmKinship.py) to esimtates a kinship matrix.
    2. Estimate variance components sigma_g^2 and sigma_e^2 of the data. You can estimate the compoenents whatever software you want to use. You can use Pylmm (pylmmGWAS_multiPhHeri.py) to estimate the variance components.
      ♦ Pylmm is a linear mixed model solver developed in our group (for the details see Pylmm). For convenience, I added the software in the download package.
  2. Prepare input (n is the number of individuals and m is the number of genoytpes)
    1. run generateR.R
      prerequistic: gtools and mvnorm library of R
      input: genotypes (Xpath, n by m matrix), Kinship (Kpath, n by n matrix) Variance Components (VCpath, The first column contains sigma_g^2 and the second column contains sigma_e^2)
      output: r.txt
      Usage: R CMD BATCH --args -Xpath="" -Kpath="" -VCpath="" -outputPath="" generateR.R generateR.log
    2. run generateC.jar
      input: r.txt (m by m correlation matrix generated from generateR.R), windowsize (1000 used in MultiTrans paper, read MultiTrans paper for the detail)
      output: c.txt
      Usage: java -jar generateC.jar windowSize r.txt c.txx
  3. Run MultiTrans
    1. MultiTrans_1prep: data pre-processing.
      Usage: ./MultiTrans_1prep [-C] [c.txt] [window size] [output:prep file]
    2. MultiTrans_2run: run the actual sampling.
      Usage: ./MultiTrans_2run [prep file] [output:max stat file] [#sampling] [seed]
      10000000 sampling used in MultiTrans paper and you can set a random seem such as 12345678
    3. MultiTrans_3sort: sort the maximum statistic.
      Usage: ./MultiTrans_3sort [output:sorted file] [max stat file_1] [max stat file_2] [max stat file_3]
      If you divided genomic region into N independent regions (e.g. chromosomes), provide all the max-stat-files. In that case, the number of sampling for each file has to be identical. If you did not divide the region, just use one max-stat-file, [sorted file] stores the sorted maximum statistics over the whole genome.
    4. MultiTrans_4correct: correct p-values.
      Usage:./MultiTrans_4correct -p [sorted file] [pointwise-p file] [final output file]
      [pointwise-p file] is a text file containing pointwise p-values you want to correct, delimitered by space or newline. In the testData folder, there is a file names threshold.txt which you can use for the pointwise-p file.
    For the details of the options running MultiTrans, see SLIDE Usage as MultiTrans s/w is developed based on the SLIDE.