Documentation

Download

MARS.zip (MARS.tar.gz) contains the followings,

  • readme.txt : System requirement, Intallation guid, Intructions for use

  • readme_testData.txt : Instruction to test the software on the sample data, including the test environment

  • MARS: main program to run the software

  • MARS_alt.R, MARS_NULL.R, computePvalue.R, computePvalue_GWAS.R, generateLD.R : scripts for preprocessing the data to run MARS

  • sample_data:test_GENO, test_STAT, expectedResult

  • LICENSE

Installation

User's guide

    Installation (Requirement)

    R(R library : Matrix, mvtnorm), g++, gsl

    Run MARS

    Goal: Compute LRT score for the given statistics and genotypes

    Usage: ./MARS -z stat [-x genotype or -l LD_matrix] -n number_of_samples -m number_of_simulations -o output [-a 0_for_null/1_for_alternative(default:1)] [-c number_of_causal_variants(default:2)]

    Option:

    We assume m as the number of SNP(genotypes) and n as number of samples(individuals)

    -z stat file path (1x50, 50 number of summary statistics) or (number_of_simulationsx101, for the null analysis)

    -x genotype file path (m x n)

    -l ld file path (m x m)

    -n number of samples (individuals)

    -o output file path

    -a if you want to analyze null statistics, set 1, default:0

    -c number of causal variants to consider in the analysis

    output: (LRT_score, univariate_pvalue) or (weight LRT_score, univariate_pvalue for the null analysis)

    Analysis example with a test data

    1. Compute LRT score for the data

      • Data preparation : Select top 50 stat and corresponding SNPs

        Usage : R CMD BATCH '--args -g=genotypePath -s=statPath -o=output_genotype -u=output_stat [-t=topNum(default:50)]' MARS_alt.R

        Input : stat(m x 1), geno(m x n)

        Output : stat(1 x 50) geno(50 x n)

        Command : R CMD BATCH '--args -g=sample_data/test_GENO -s=sample_data/test_STAT -o=sample_data/test_GENO50 -u=sample_data/test_STAT50 -t=50' MARS_alt.R

      • Run MARS to compute LRT score

        Usage : ./MARS -z stat [-x genotype or -l LD_matrix] -n number_of_samples -o output [-a 0_for_null/1_for_alternative(default:1)] [-c number_of_causal_variants(default:2)]
        Input : stat (1x50) geno(m x n) or ld(m x m)
        Output : pvalue_UNI LRT_score (2x1)

        Use genotypes (use snp to generate ld, note that the snp should make positive semidefinite ld matrix, if not, use ld matrix option -l)
        Command : ./MARS -z sample_data/test_STAT50 -x sample_data/test_GENO50 -n 338 -o sample_data/test_output -a 1

        Use ld matrix
        Command : R CMD BATCH '--args -g=sample_data/test_GENO50 -o=sample_data/test_LD50' generateLD.R
        ./MARS -z sample_data/test_STAT50 -l sample_data/test_LD50 -n 338 -o sample_data/test_output2 -a 1

    2. Compute LRT score for the null

      • Generate null samples
        Usage : R CMD BATCH '--args number_of_simulations genotype output [MARS/fastMARS(0/1, default:0)]' MARS_NULL.R
        Input : genotype(mxn)
        Output : weight [stat1 index1 stat2 index2 ... stat50 index50](number_of_simulations x 101)

        MARS
        R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL -f=0 -t=50' MARS_NULL.R

        fastMARS
        R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL2 -f=1 -t=50' MARS_NULL.R

      • Run MARS on the null samples
        Usage : ./MARS -z stat [-x genotype or -l LD_matrix] -n number_of_samples -m number_of_simulations -o output [-a 0_for_null/1_for_alternative(default:1)] [-c number_of_causal_variants(default:2)]
        Input : stat (number_of_simulations x 101) geno(m x n)
        Output : weight pvalue_UNI LRT_score (number_of_simulationsx3)

        MARS
        Command: ./MARS -z sample_data/test_NULL -x sample_data/test_GENO -n 338 -o sample_data/test_NULL_output -a 0 -m 10000

        fastMARS
        Command: ./MARS -z sample_data/test_NULL2 -x sample_data/test_GENO -n 338 -o sample_data/test_NULL2_output -a 0 -m 10000

    3. Compute Pvalue

      • Compute pvalue to identify the multiloci association
        Description: order the LRT_scores from the null samples and find the qunatile of the LRT_score from the data to compute the pvalue
        Usage: R CMD BATCH '--args -a=LRT_data -n=LRT_null -o=output [-t=threshold(default:0.05)] computePvalue.R
        Command: R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result -t=0.02797203' computePvalue.R
      • MARS for GWAS anlaysis
        Description: Find the quantile of a univariate threshold(default:5e-08) from univariate pvalues from null samples. Find the LRT_threshold by finding the LRT_score of the quantile from the LRT_scores of null samples. Check if the LRT_score of the data is greater than the LRT_threshold to find the significance. See the manuscript for the details.
        Usage: R CMD BATCH '--args -a=LRT_data -n=LRT_null -o=output [-f=MARS/fastMARS(0/1, default:0)] [-u=univariate threshold(default:5e-08)]' computePvalue_GWAS.R
        Command: R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result_GWAS -f=0 -u=5e-6' computePvalue_GWAS.R
      • fastMARS for GWAS anlaysis
        Description: Use the weights to find the significance. Check the manuscript for the details.
        Usage: R CMD BATCH '--args -a=LRT_data -n=LRT_null -o=output [-f=MARS/fastMARS(0/1, default:0)] [-u=univariate threshold(default:5e-08)]' computePvalue_GWAS.R
        Command: R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL2_output_LRT -o=sample_data/test_result_GWAS2 -f=1 -u=5e-6' computePvalue_GWAS.R
    4. Running example

      • How to run: Copy and paste each line in the MARS directory, expected run time on a normal desktop computer is less than 2hours, and the expected outputs are in the MARS/expectedResult/ directory)

      • MARS example

        R CMD BATCH '--args -g=sample_data/test_GENO -s=sample_data/test_STAT -o=sample_data/test_GENO50 -u=sample_data/test_STAT50 -t=50' MARS_alt.R

        ./MARS -z sample_data/test_STAT50 -x sample_data/test_GENO50 -n 338 -o sample_data/test_output -a 1

        R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL -f=0 -t=50' MARS_NULL.R

        ./MARS -z sample_data/test_NULL -x sample_data/test_GENO -n 338 -o sample_data/test_NULL_output -a 0 -m 10000

        R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result -t=0.02797203' computePvalue.R

        R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL_output_LRT -o=sample_data/test_result_GWAS -f=0 -u=5e-6' computePvalue_GWAS.R

      • fastMARS example

        ./MARS -z sample_data/test_STAT50 -x sample_data/test_GENO50 -n 338 -o sample_data/test_output -a 1

        R CMD BATCH '--args -n=10000 -g=sample_data/test_GENO -o=sample_data/test_NULL2 -f=1 -t=50' MARS_NULL.R

        ./MARS -z sample_data/test_NULL2 -x sample_data/test_GENO -n 338 -o sample_data/test_NULL2_output -a 0 -m 10000

        R CMD BATCH '--args -a=sample_data/test_output_LRT -n=sample_data/test_NULL2_output_LRT -o=sample_data/test_result_GWAS2 -f=1 -u=5e-6' computePvalue_GWAS.R


For the details or any other questions, please contact Jong Wha J Joo