Documentation
Download
MARS.zip (MARS.tar.gz) contains the followings,
readme.txt : System requirement, Intallation guid, Intructions for use
readme_testData.txt : Instruction to test the software on the sample data, including the test environment
MARS: main program to run the software
MARS_alt.R, MARS_NULL.R, computePvalue.R, computePvalue_GWAS.R, generateLD.R : scripts for preprocessing the data to run MARS
sample_data:test_GENO, test_STAT, expectedResult
LICENSE
Installation
User's guide
Installation (Requirement)
R(R library : Matrix, mvtnorm), g++, gsl
Run MARS
Goal: Compute LRT score for the given statistics and genotypes
Usage: ./MARS -z stat [-x genotype or -l LD_matrix] -n number_of_samples -m number_of_simulations -o output [-a 0_for_null/1_for_alternative(default:1)] [-c number_of_causal_variants(default:2)]
Option:
We assume m as the number of SNP(genotypes) and n as number of samples(individuals)
-z stat file path (1x50, 50 number of summary statistics) or (number_of_simulationsx101, for the null analysis)
-x genotype file path (m x n)
-l ld file path (m x m)
-n number of samples (individuals)
-o output file path
-a if you want to analyze null statistics, set 1, default:0
-c number of causal variants to consider in the analysis
output: (LRT_score, univariate_pvalue) or (weight LRT_score, univariate_pvalue for the null analysis)
Analysis example with a test data
- Compute LRT score for the data
- Data preparation : Select top 50 stat and corresponding SNPs
Usage : R CMD BATCH '--args -g=genotypePath -s=statPath -o=output_genotype -u=output_stat [-t=topNum(default:50)]' MARS_alt.R
Input : stat(m x 1), geno(m x n)
Output : stat(1 x 50) geno(50 x n)
Command : R CMD BATCH '--args -g=sample_data/test_GENO -s=sample_data/test_STAT -o=sample_data/test_GENO50 -u=sample_data/test_STAT50 -t=50' MARS_alt.R
- Run MARS to compute LRT score
Usage : ./MARS -z stat [-x genotype or -l LD_matrix] -n number_of_samples -o output [-a 0_for_null/1_for_alternative(default:1)] [-c number_of_causal_variants(default:2)]
Input : stat (1x50) geno(m x n) or ld(m x m)
Output : pvalue_UNI LRT_score (2x1)
Use genotypes (use snp to generate ld, note that the snp should make positive semidefinite ld matrix, if not, use ld matrix option -l)
Command : ./MARS -z sample_data/test_STAT50 -x sample_data/test_GENO50 -n 338 -o sample_data/test_output -a 1
Use ld matrix
Command : R CMD BATCH '--args -g=sample_data/test_GENO50 -o=sample_data/test_LD50' generateLD.R
./MARS -z sample_data/test_STAT50 -l sample_data/test_LD50 -n 338 -o sample_data/test_output2 -a 1
- Compute LRT score for the null
- Generate null samples
Usage : R CMD BATCH '--args number_of_simulations genotype output [MARS/fastMARS(0/1, default:0)]' MARS_NULL.R
Input : genotype(mxn)
Output : weight [stat1 index1 stat2 index2 ... stat50 index50](number_of_simulations x 101)
MARS
R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL -f=0 -t=50' MARS_NULL.R
fastMARS
R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL2 -f=1 -t=50' MARS_NULL.R
- Run MARS on the null samples
Usage : ./MARS -z stat [-x genotype or -l LD_matrix] -n number_of_samples -m number_of_simulations -o output [-a 0_for_null/1_for_alternative(default:1)] [-c number_of_causal_variants(default:2)]
Input : stat (number_of_simulations x 101) geno(m x n)
Output : weight pvalue_UNI LRT_score (number_of_simulationsx3)
MARS
Command: ./MARS -z sample_data/test_NULL -x sample_data/test_GENO -n 338 -o sample_data/test_NULL_output -a 0 -m 10000
fastMARS
Command: ./MARS -z sample_data/test_NULL2 -x sample_data/test_GENO -n 338 -o sample_data/test_NULL2_output -a 0 -m 10000
- Compute Pvalue
- Compute pvalue to identify the multiloci association
Description: order the LRT_scores from the null samples and find the qunatile of the LRT_score from the data to compute the pvalue
Usage: R CMD BATCH '--args -a=LRT_data -n=LRT_null -o=output [-t=threshold(default:0.05)] computePvalue.R
Command: R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result -t=0.02797203' computePvalue.R
- MARS for GWAS anlaysis
Description: Find the quantile of a univariate threshold(default:5e-08) from univariate pvalues from null samples. Find the LRT_threshold by finding the LRT_score of the quantile from the LRT_scores of null samples. Check if the LRT_score of the data is greater than the LRT_threshold to find the significance. See the manuscript for the details.
Usage: R CMD BATCH '--args -a=LRT_data -n=LRT_null -o=output [-f=MARS/fastMARS(0/1, default:0)] [-u=univariate threshold(default:5e-08)]' computePvalue_GWAS.R
Command: R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result_GWAS -f=0 -u=5e-6' computePvalue_GWAS.R
- fastMARS for GWAS anlaysis
Description: Use the weights to find the significance. Check the manuscript for the details.
Usage: R CMD BATCH '--args -a=LRT_data -n=LRT_null -o=output [-f=MARS/fastMARS(0/1, default:0)] [-u=univariate threshold(default:5e-08)]' computePvalue_GWAS.R
Command: R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL2_output_LRT -o=sample_data/test_result_GWAS2 -f=1 -u=5e-6' computePvalue_GWAS.R
- Running example
- How to run: Copy and paste each line in the MARS directory, expected run time on a normal desktop computer is less than 2hours, and the expected outputs are in the MARS/expectedResult/ directory)
- MARS example
R CMD BATCH '--args -g=sample_data/test_GENO -s=sample_data/test_STAT -o=sample_data/test_GENO50 -u=sample_data/test_STAT50 -t=50' MARS_alt.R
./MARS -z sample_data/test_STAT50 -x sample_data/test_GENO50 -n 338 -o sample_data/test_output -a 1
R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL -f=0 -t=50' MARS_NULL.R
./MARS -z sample_data/test_NULL -x sample_data/test_GENO -n 338 -o sample_data/test_NULL_output -a 0 -m 10000
R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result -t=0.02797203' computePvalue.R
R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result_GWAS -f=0 -u=5e-6' computePvalue_GWAS.R
- fastMARS example
./MARS -z sample_data/test_STAT50 -x sample_data/test_GENO50 -n 338 -o sample_data/test_output -a 1
R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL2 -f=1 -t=50' MARS_NULL.R
./MARS -z sample_data/test_NULL2 -x sample_data/test_GENO -n 338 -o sample_data/test_NULL2_output -a 0 -m 10000
R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL2_output_LRT -o=sample_data/test_result_GWAS2 -f=1 -u=5e-6' computePvalue_GWAS.R
For the details or any other questions, please contact Jong Wha J Joo
|