HARSHHAplotype inference using Reference and Sequencing tecHnology (HARSH) is a method to infer the haplotype using haplotype reference panel and high throughput sequencing data. It is based on a novel probabilistic model and Gibbs sampler method. HARSH was created by Wen-Yun Yang, Farhad Hormozdiari, Zhanyong Wang, Bogdan Panasiuc and Eleazar Eskin. ManualOnline user's guide
usage: harsh [options] --sfile <FILE> sequencing read file --rfile <FILE> reference haplotype file --mfile <FILE> SNP map file --output <FILE> output file for predicted haplotype and confidence -n number of sampling iterations (default 10,000) -u smoothing parameter for sampling (default 1) -e sequencing error rate (default 0.01) -w mismatch error rate between reference and donar haplotype (default 2e-3) -v verbose level (default 1) Please be advised that every HARSH requires seperate run for each chromosome in current version. It is not necessary to phase all chromosomes together.
<--- myfile.seq ---> <--- myfile.ref ---> 1 10101-10 00000000000010000001 11 00010--10 10101001010010010101 5 0011111 11101011011100010010 10 1---0010 00000000000000000000 8 00001111 00000001000001000000 3 0010101--1 00101000100000100100 5 00001-100 00100100100000000100 00000100100000000101 00101100100000000100 00100100000000000001 00101100100010000100 00000100100010000100 00100100000000000000 00100100100001000100 00000100100000010001 00100100000010000100 00100100000010000100 00100100000010000100 00100100000010000100 00100100000010000100 <--- myfile.map ---> 1 snp1 0.000 5000650 0 1 1 snp2 0.012 5000830 0 1 1 snp3 0.024 5000835 0 1 1 snp4 0.067 5000840 0 1 1 snp5 0.080 5000845 0 1 1 snp6 0.102 5000848 0 1 1 snp7 0.104 5000870 0 1 1 snp8 0.156 5000881 0 1 1 snp9 0.159 5000893 0 1 1 snp10 0.165 5000901 0 1 1 snp11 0.178 5000914 0 1 1 snp12 0.189 5001000 0 1 1 snp13 0.193 5001010 0 1 1 snp14 0.204 5001105 0 1 1 snp15 0.250 5001202 0 1 1 snp16 0.260 5001290 0 1 1 snp17 0.270 5001295 0 1 1 snp18 0.306 5001400 0 1 1 snp19 0.506 5001502 0 1 1 snp20 0.510 5001530 0 1 The columns in myfile.seq are The columns in myfile.ref are reference haplotypes. Each column represents one haplotype. The columns in myfile.map are
<-- myfile.hap --> 10 10 11 00 01 00 11 00 11 10 10 10 00 00 01 10 00 01 10 00 Usage of the BAM/VCF convertor
python convertor.py -o outputFile -v xxx.vcf -b xxx.bam (replace xxx with the real file name, and python needs to be version 2.7)
|