Documentation

For large datasets, for the better performance, we suggest to seperate the files into window size and use clusters for parallel submission. You can combine the results in step 3. C., the detailas are described below.

Download

MultiTrans.zip (MultiTrans.tar.gz) contains the followings,

  • Pylmm_MultiTrans : modified Pylmm for MultiTrans
    pylmmKinship.py : esitmate a kinship matrix
    pylmmGWAS_multiPhHeri.py : estimate variance components of the data

  • Estimate correlation in the rotated space

    GAMMA.R

    generateR.R : generate correlation in the rotated space
    generateC.jar : change correlation into SLIDE format
  • Test data: see HowToRun.txt

  • License

User's guide

  1. Prerequistic

    Estimate a kinship matrix, K. You can esimtate a kinship matrix from genoytpes whatever software you want to use. You can use Pylmm (pylmmKinship.py) to esimtates a kinship matrix.

    Estimate variance components sigma_g^2 and sigma_e^2 of the data. You can estimate the compoenents whatever software you want to use. You can use Pylmm (pylmmGWAS_multiPhHeri.py) to estimate the variance components.
    ♦ Pylmm is a linear mixed model solver developed in our group (for the details see Pylmm). We have modified Pylmm for MultiTrans which is included in Pylmm_MultiTrans.

  2. Estimate correlation in the rotated space (n is the number of individuals and m is the number of genoytpes)
    1. run generateR.R
      prerequistic: gtools and mvnorm library of R
      input: genotypes (Xpath, n by m matrix), Kinship (Kpath, n by n matrix) Variance Components (VCpath, The first column contains sigma_g^2 and the second column contains sigma_e^2)
      output: r.txt
      Usage: R CMD BATCH --args -Xpath="" -Kpath="" -VCpath="" -outputPath="" generateR.R generateR.log
    2. run generateC.jar
      input: r.txt (m by m correlation matrix generated from generateR.R), windowsize (1000 used in MultiTrans paper, read MultiTrans paper for the detail)
      output: c.txt
      Usage: java -jar generateC.jar windowSize r.txt c.txx
  3. Run SLIDE (for download and details, see SLIDE)
    1. slide_1prep: data pre-processing.
      Usage: ./slide_1prep [-C] [c.txt] [window size] [output:prep file]
    2. slide_2run: run the actual sampling.
      Usage: ./slide_2run [prep file] [output:max stat file] [#sampling] [seed]
      10000000 sampling used in MultiTrans paper and you can set a random seem such as 12345678
    3. slide_3sort: sort the maximum statistic.
      Usage: ./slide_3sort [output:sorted file] [max stat file_1] [max stat file_2] [max stat file_3]
      If you divided genomic region into N independent regions (e.g. chromosomes), provide all the max-stat-files. In that case, the number of sampling for each file has to be identical. If you did not divide the region, just use one max-stat-file, [sorted file] stores the sorted maximum statistics over the whole genome.
    4. slide_4correct: correct p-values.
      Usage:./slide_4correct -p [sorted file] [pointwise-p file] [final output file]
      [pointwise-p file] is a text file containing pointwise p-values you want to correct, delimitered by space or newline. In the testData folder, there is a file names threshold.txt which you can use for the pointwise-p file.

    ♦ SLIDE is a multivariate normal distribution (MVN)-based multiple hypothesis testing correction method developed in our group. For download and details, see SLIDE

    For the details or any other questions, please contact Jong Wha J Joo