After manual review of weird SNPs in het_coverage.normal.tsv (based on suspicious small regions of CNLoH), I found suspicious SNPs have these characteristics and sites turn out as false positive heterozygous SNP (should be HOM ALT):
- (hg38) chr2:189728585, total coverage 8, ALT has 7, REF has 1 (QV11). QV 11 should not pass filter and this site should be 100% VAF ALT. (HOM ALT) (this is a case of false positive ref base)
- (hg38) chr2:189731263, total coverage 7, ALT has 6, REF has 1 (QV11). QV 11 should not pass filter and this site should be 100% VAF ALT. (HOM ALT) (similar; see Fig. 1)
- (hg38) chr2: 189734985, 2 reads (missed PCR) chimeric at pseudo-palindromic regions (this is more specific to the enzymatic shearing protocol we use and I need to think more about how to filter those, maybe during bam alignment).
- (hg38) chr4: 39565930, 1 read is MAPQ0 (Fig. 3). I think sites with MAPQ0 should be treated as suspicious and not used for genotyping.
Solutions:
- Do not use MAPQ0 for genotyping (branch
filter_mapq0)
- Rely only on high quality bases for genotyping
- Sequence deeper whenever possible
Fig. 1

Fig. 2

Fig. 3

After manual review of weird SNPs in
het_coverage.normal.tsv(based on suspicious small regions of CNLoH), I found suspicious SNPs have these characteristics and sites turn out as false positive heterozygous SNP (should be HOM ALT):Solutions:
filter_mapq0)Fig. 1

Fig. 2
Fig. 3
