For large datasets, for the better performance, we suggest to seperate the files into window size and use clusters for parallel submission. You can combine the results in step 3. C., the detailas are described below.
MultiTrans.zip (MultiTrans.tar.gz) contains the followings,
Pylmm_MultiTrans : modified Pylmm for MultiTrans
pylmmKinship.py : esitmate a kinship matrix
pylmmGWAS_multiPhHeri.py : estimate variance components of the data
Estimate correlation in the rotated space
generateR.R : generate correlation in the rotated space
generateC.jar : change correlation into SLIDE format
Test data: see HowToRun.txt
Estimate a kinship matrix, K. You can esimtate a kinship matrix from genoytpes whatever software you want to use. You can use Pylmm (pylmmKinship.py) to esimtates a kinship matrix.
Estimate variance components sigma_g^2 and sigma_e^2 of the data. You can estimate the compoenents whatever software you want to use. You can use Pylmm (pylmmGWAS_multiPhHeri.py) to estimate the variance components.
♦ Pylmm is a linear mixed model solver developed in our group (for the details see Pylmm). We have modified Pylmm for MultiTrans which is included in Pylmm_MultiTrans.
- Estimate correlation in the rotated space (n is the number of individuals and m is the number of genoytpes)
- run generateR.R
prerequistic: gtools and mvnorm library of R
input: genotypes (Xpath, n by m matrix), Kinship (Kpath, n by n matrix) Variance Components (VCpath, The first column contains sigma_g^2 and the second column contains sigma_e^2)
Usage: R CMD BATCH --args -Xpath="" -Kpath="" -VCpath="" -outputPath="" generateR.R generateR.log
- run generateC.jar
input: r.txt (m by m correlation matrix generated from generateR.R), windowsize (1000 used in MultiTrans paper, read MultiTrans paper for the detail)
Usage: java -jar generateC.jar windowSize r.txt c.txx
- Run SLIDE (for download and details, see SLIDE)
- slide_1prep: data pre-processing.
Usage: ./slide_1prep [-C] [c.txt] [window size] [output:prep file]
- slide_2run: run the actual sampling.
Usage: ./slide_2run [prep file] [output:max stat file] [#sampling] [seed]
10000000 sampling used in MultiTrans paper and you can set a random seem such as 12345678
- slide_3sort: sort the maximum statistic.
Usage: ./slide_3sort [output:sorted file] [max stat file_1] [max stat file_2] [max stat file_3]
If you divided genomic region into N independent regions (e.g. chromosomes), provide all the max-stat-files. In that case, the number of sampling for each file has to be identical. If you did not divide the region, just use one max-stat-file, [sorted file] stores the sorted maximum statistics over the whole genome.
- slide_4correct: correct p-values.
Usage:./slide_4correct -p [sorted file] [pointwise-p file] [final output file]
[pointwise-p file] is a text file containing pointwise p-values you want to correct, delimitered by space or newline. In the testData folder, there is a file names threshold.txt which you can use for the pointwise-p file.
♦ SLIDE is a multivariate normal distribution (MVN)-based multiple hypothesis testing correction method developed in our group. For download and details, see SLIDE
For the details or any other questions, please contact Jong Wha J Joo