HARSHHAplotype inference using Reference and Sequencing tecHnology (HARSH) is a method to infer the haplotype using haplotype reference panel and high throughput sequencing data. It is based on a novel probabilistic model and Gibbs sampler method. HARSH was created by Wen-Yun Yang, Farhad Hormozdiari, Zhanyong Wang, Bogdan Panasiuc and Eleazar Eskin. ManualOnline user's guide
usage: harsh [options]
--sfile <FILE> sequencing read file
--rfile <FILE> reference haplotype file
--mfile <FILE> SNP map file
--output <FILE> output file for predicted haplotype and confidence
-n number of sampling iterations (default 10,000)
-u smoothing parameter for sampling (default 1)
-e sequencing error rate (default 0.01)
-w mismatch error rate between reference and donar haplotype (default 2e-3)
-v verbose level (default 1)
Please be advised that every HARSH requires seperate run for each chromosome in current version. It is not necessary to phase all chromosomes together.
<--- myfile.seq ---> <--- myfile.ref --->
1 10101-10 00000000000010000001
11 00010--10 10101001010010010101
5 0011111 11101011011100010010
10 1---0010 00000000000000000000
8 00001111 00000001000001000000
3 0010101--1 00101000100000100100
5 00001-100 00100100100000000100
00000100100000000101
00101100100000000100
00100100000000000001
00101100100010000100
00000100100010000100
00100100000000000000
00100100100001000100
00000100100000010001
00100100000010000100
00100100000010000100
00100100000010000100
00100100000010000100
00100100000010000100
<--- myfile.map --->
1 snp1 0.000 5000650 0 1
1 snp2 0.012 5000830 0 1
1 snp3 0.024 5000835 0 1
1 snp4 0.067 5000840 0 1
1 snp5 0.080 5000845 0 1
1 snp6 0.102 5000848 0 1
1 snp7 0.104 5000870 0 1
1 snp8 0.156 5000881 0 1
1 snp9 0.159 5000893 0 1
1 snp10 0.165 5000901 0 1
1 snp11 0.178 5000914 0 1
1 snp12 0.189 5001000 0 1
1 snp13 0.193 5001010 0 1
1 snp14 0.204 5001105 0 1
1 snp15 0.250 5001202 0 1
1 snp16 0.260 5001290 0 1
1 snp17 0.270 5001295 0 1
1 snp18 0.306 5001400 0 1
1 snp19 0.506 5001502 0 1
1 snp20 0.510 5001530 0 1
The columns in myfile.seq are The columns in myfile.ref are reference haplotypes. Each column represents one haplotype. The columns in myfile.map are
<-- myfile.hap --> 10 10 11 00 01 00 11 00 11 10 10 10 00 00 01 10 00 01 10 00 Usage of the BAM/VCF convertor
python convertor.py -o outputFile -v xxx.vcf -b xxx.bam (replace xxx with the real file name, and python needs to be version 2.7)
|