The current implementation outputs VAR and REF read counts for non-synonymous variants only. I would be great, as a user to have the option to output read-support counts for all variants. I've used Varlens to get around this current limitation in Isovar, but that route has it's own limitations which I'll discuss below.
Per a conversation w/ Alex:
Hey John,
I looked a little bit and found that on line 67 of isovar.effect_prediction I'm doing the following:
nonsynonymous_coding_effects = effects.drop_silent_and_noncoding()
Do you want me to make this optional for the purposes of counting variant reads and assembling variant sequences?
If so, can you file an issue on the repo? https://github.com/openvax/isovar/issues
Eliminating the hard filter for non-synonymous variants affords the user a bit of added flexibility, but would necessitate additional descriptors for each variant to enable filtering to variant classes of interest. I think two additional columns, "Effect_Class" and "Effect" would solve the filtering problem and make working with the isovar output relatively easy.
I believe two columns may be required largely because of my experience working with Varlens. The Varlens output has an "effect" column that describes the specific coding effect of a variant (e.g. p.G12D). However, I've found this to be difficult to work-with in practice as AFAIK there is no easy way to parse non-synonymous SNVs ("p.G12D"), in-frame INDEL ("p.HDVPS811del") and framshifts (p.A117fs). It may be better to have separate columns for effect class ("Exon, non-synonymous") separated from the descriptor of the specific effect (p.G12D).
Ideally an effect class column would provide the same filtering as the current hard-coded isovar filters, or use the standard Ensembl classes.
- 3' UTR
- 5' UTR
- exonic-splice-site
- Incomplete
- Intergenic
- Intragenic
- Intronic
- intronic-splice-site
- non-coding-transcript
- Silent
- splice-acceptor
- Splice-donor
- Stop-loss
- Stop-gain
- Exon, Non-synonymous
The specific use case I have in mind is counting the number of variants, the number of variants with RNA read-support; and finally how the latter category breaks down by variant type (e.g. SNV, SNV w/ coding effect, Indel, etc).
The current implementation outputs VAR and REF read counts for non-synonymous variants only. I would be great, as a user to have the option to output read-support counts for all variants. I've used Varlens to get around this current limitation in Isovar, but that route has it's own limitations which I'll discuss below.
Per a conversation w/ Alex:
Eliminating the hard filter for non-synonymous variants affords the user a bit of added flexibility, but would necessitate additional descriptors for each variant to enable filtering to variant classes of interest. I think two additional columns, "Effect_Class" and "Effect" would solve the filtering problem and make working with the isovar output relatively easy.
I believe two columns may be required largely because of my experience working with Varlens. The Varlens output has an "effect" column that describes the specific coding effect of a variant (e.g. p.G12D). However, I've found this to be difficult to work-with in practice as AFAIK there is no easy way to parse non-synonymous SNVs ("p.G12D"), in-frame INDEL ("p.HDVPS811del") and framshifts (p.A117fs). It may be better to have separate columns for effect class ("Exon, non-synonymous") separated from the descriptor of the specific effect (p.G12D).
Ideally an effect class column would provide the same filtering as the current hard-coded isovar filters, or use the standard Ensembl classes.
The specific use case I have in mind is counting the number of variants, the number of variants with RNA read-support; and finally how the latter category breaks down by variant type (e.g. SNV, SNV w/ coding effect, Indel, etc).