-
Notifications
You must be signed in to change notification settings - Fork 0
run.bwamem.sh
Simon Crameri edited this page Apr 1, 2022
·
12 revisions
Map single-end or paired-end reads of a batch of samples in parallel using BWA MEM (Li 2013), samtools (Li et al. 2009) and GNU parallel (Tange 2011).
run.bwamem.sh -s <file> -r <file> -e <string> -d <directory> -T <positive integer> -Q <positive integer> \
-o <directory> -t <positive integer>
bwa
sambamba
java
python
picard-tools
r
# Required
-s File with sample names (without header or '>')
-r File with reference sequences (FASTA format)
-e Sample file extension(s). E.g. '.fasta' or '.trim.fq.gz' [unpaired] or '.trim1.fastq,.trim2.fastq' [paired].
The program then interprets if data is unpaired or paired. Separate file extensions of file pairs with a ','.
# Optional [DEFAULT]
-d [pwd] Path to input reads.
-T [10] Minimum bwa mem alignment score, passed to -T parameter of bwa mem.
-Q [20] Minimum mapping quality, used to filter by the fifth field / MAPQ column in BAM files. Must be Q >= T.
-o [pwd] Path to output directory. A folder will be created if it does not exist.
-t [3] Number of samples processed in parallel.
Can be between 1 (uses ${cpu} CPU cores in total) and 6 (uses 6*${cpu} CPU cores in total, cpu=4).
-s The sample file must contain the sample basenames (i.e., sample name without file extensions specified in -e).
Use a single line for paired reads in two files, e.g. the sample basename for files 'SH598_S16.trim1.fastq.gz'
and 'SH598_S16.trim2.fastq.gz' would be 'SH598_S16' if -e is set to '.trim1.fastq.gz,.trim2.fastq.gz'.
-Q A value above 0 will filter reads with multiple mappings. Only reads passing the -Q filter will be written to
the output directory, after removing PCR duplicates using the picard MarkDuplicates tool.
An output directory with a subfolder ${name} for each sample, containing these main files.
- bwa.Q${Q}.log Log file.
- ${name}.bwa-mem.sorted.Q${Q}.nodup.bam BAM file with reads mapped with high quality (MAIN RESULT).
- ${name}.bwa-mem.sorted.Q${Q}.nodup.bam.bai BAI index file.
- ${name}.flagstats.txt BWA flagstats of all mapped reads.
- ${name}.Q${Q}.nodup.flagstats.txt BWA flagstats of reads mapped with high quality.
- ${name}.dupstats.txt PCR duplicates statistics.
run.bwamem.sh -s samples.txt -r ref.fasta -e '.trim1.fastq.gz,.trim2.fastq.gz' -T 10 -Q 20 -d NS-run1_trimmed -t 4
Simon Crameri (ETHZ) and Stefan Zoller (GDC)
- Li, H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v1302.
- Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and G. P. D. P. Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078-2079.
- Tange, O. 2011. GNU parallel—The command-line power tool. Login: The USENIX Magazine 36:42-47.
CaptureAl v0.1 Documentation