Skip to content

run.bwamem.sh

Simon Crameri edited this page Apr 1, 2022 · 12 revisions

Description

Map single-end or paired-end reads of a batch of samples in parallel using BWA MEM (Li 2013), samtools (Li et al. 2009) and GNU parallel (Tange 2011).

Usage

run.bwamem.sh -s <file> -r <file> -e <string> -d <directory> -T <positive integer> -Q <positive integer> \ 
              -o <directory> -t <positive integer>

Dependencies

bwa
sambamba
java
python
picard-tools
r

Arguments

# Required
-s        File with sample names (without header or '>')
-r        File with reference sequences (FASTA format)
-e        Sample file extension(s). E.g. '.fasta' or '.trim.fq.gz' [unpaired] or '.trim1.fastq,.trim2.fastq' [paired].
          The program then interprets if data is unpaired or paired. Separate file extensions of file pairs with a ','.

# Optional [DEFAULT]
-d  [pwd] Path to input reads.
-T  [10]  Minimum bwa mem alignment score, passed to -T parameter of bwa mem.
-Q  [20]  Minimum mapping quality, used to filter by the fifth field / MAPQ column in BAM files. Must be Q >= T.
-o  [pwd] Path to output directory. A folder will be created if it does not exist.
-t  [3]   Number of samples processed in parallel.
          Can be between 1 (uses ${cpu} CPU cores in total) and 6 (uses 6*${cpu} CPU cores in total, cpu=4).

Details

-s  The sample file must contain the sample basenames (i.e., sample name without file extensions specified in -e).
    Use a single line for paired reads in two files, e.g. the sample basename for files 'SH598_S16.trim1.fastq.gz'
    and 'SH598_S16.trim2.fastq.gz' would be 'SH598_S16' if -e is set to '.trim1.fastq.gz,.trim2.fastq.gz'.

-Q  A value above 0 will filter reads with multiple mappings. Only reads passing the -Q filter will be written to
    the output directory, after removing PCR duplicates using the picard MarkDuplicates tool.

Value

An output directory with a subfolder ${name} for each sample, containing these main files.

- bwa.Q${Q}.log                               Log file.
- ${name}.bwa-mem.sorted.Q${Q}.nodup.bam      BAM file with reads mapped with high quality (MAIN RESULT).
- ${name}.bwa-mem.sorted.Q${Q}.nodup.bam.bai  BAI index file.
- ${name}.flagstats.txt                       BWA flagstats of all mapped reads.
- ${name}.Q${Q}.nodup.flagstats.txt           BWA flagstats of reads mapped with high quality.
- ${name}.dupstats.txt                        PCR duplicates statistics.

Examples

run.bwamem.sh -s samples.txt -r ref.fasta -e '.trim1.fastq.gz,.trim2.fastq.gz' -T 10 -Q 20 -d NS-run1_trimmed -t 4

Authors

Simon Crameri (ETHZ) and Stefan Zoller (GDC)

References

  • Li, H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v1302.
  • Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and G. P. D. P. Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078-2079.
  • Tange, O. 2011. GNU parallel—The command-line power tool. Login: The USENIX Magazine 36:42-47.

Clone this wiki locally