How to Run CNVem
Step 1. Short read mapping
We utilize the read mapping uncertainty to precisely recover the boundaries and copy number of the CNVs. To achieve the goal, we require the Mappers to report all mapping locations for each read. Not all mappers are designed to implement this function, as many of them only report one of the most possible positions. Here we adopt mrsFast (http://mrsfast.sourceforge.net/Home).
split –l 2000000 read_file_name fileName
To map the reads to the reference, we need to generate an index for the reference, the command is:
bin/mrsfast --index ref_seq.data
bin/mrsfast --search ref_seq.data -e 2 -seq fileNameaa -o map/fileNameaa.sam -u map/fileNameaa.nohit
Step 2: Preprocess the mapping information
The core of CNVem is an E-M algorithm, which runs many iterations. To speed up, we preprocess the mapping information. The majority of the reads map to only one unique position, and there is no uncertainty for these reads in E-M algorithm. So we can first calculate the depth of coverage from these unique mapping reads and only include the multiple mapping reads in the E-M algorithm. This will greatly improve the speed of CNVem.
bin/splitReads -path map/
out.uniq the mapping position of reads that map to a unique position out-score.uniq the mapping score of reads that map to a unique position out-chrom.uniq the chromosome of reads that map to a unique position out.multi the mapping positions of reads that map to multiple positions out-score.multi the mapping scores of reads that map to multiple positions out-chrom.multi the chromosome of reads that map to multiple positions
Step 3: Process the unique mapping reads
We now process the reads that map to a unique position. We will calculate the read Depth (DOC) contributed by these reads for each window.
bin/ConvertScore -um out.uniq -us out-score.uniq -wl 1000 -rs win_uni_score
-um the file of the mapping position of unique mapping reads -us the file of the mapping score of unique mapping reads -wl the window length -rs the output file that stores the read depth for each window
Step 4: Initially predict the CNVs
bin/detector_EM -wu win_uni_score -mm out.multi -ms out-score.multi -wl 1000 -cn initial
The options are:
-wu the file of DOC for each window, which is generated in step 3 -mm the file of the mapping positions of multiple mapping reads -ms the file of the mapping scores of multiple mapping reads -wl the window length -cn the prefix output file that stores the initial guess of copy numbers for each window. Results will be stored in initialcn_results.data
R CMD BATCH --no-save --no-restore '-args wl="1000" fname="initcn_result.data"' cnvd.r
The arguments are:
Step 5: Refine the prediction of CNVs
We set the window length to be 1 in order to make the prediction more precise.
bin/ConvertScore -um out.uniq -us out-score.uniq -wl 1 -rs win_uni_score
bin/detector_gs -wu win_uni_score -mm out.multi -ms out-score.multi -wl 1 -pr pred_cn
The final results will be stored in pred_cncn_result.data
R CMD BATCH --no-save --no-restore '--args wl="1" fname="pred_cncn_result.data"' cnvd.r