Skip to content

Extend NMD detection and escape-rule determination to all variant sources (currently fusion-only) #116

@riasc

Description

@riasc

Motivation

NMD (nonsense-mediated decay) detection is currently wired only for fusion events. For SNV / indel (and exitron / alt-splicing) variants the NMD and NMD_escape_rule output columns come out empty.

NMD escape determines whether a PTC-introducing variant's truncated/frameshifted protein is actually translated and presentable. Indels and frameshifts are a large share of ScanNeo2's neoantigen yield, so not evaluating NMD for them systematically over- or under-calls those neoantigens. This issue: extend NMD detection and escape-rule determination to all variant sources.

Current state

The machinery substantially exists already — this is "extend + wire up", not greenfield:

  • workflow/scripts/prioritization/effects.py has find_stop_codon, annotate_stop_codon, and check_escapecheck_escape already returns escape-rule codes (e.g. 2/3/4, or -1 for no escape) — but the PTC/escape block is exercised only on the fusion path.
  • The neoepitope output table already carries NMD, PTC_dist_ejc, PTC_exon_number, NMD_escape_rule columns; for non-fusion variants they are empty.
  • effects.py carries a # TODO: check for NMD out of VEP.

Approach

Two routes for the non-fusion variants:

  • (a) VEP NMD annotation — what the existing # TODO points at. Low effort, but VEP tends to give a yes/no rather than the specific escape rule.
  • (b) Reuse effects.py's own escape logic — feed the PTC genomic position and the transcript exon model (already loaded as self.exome) into find_stop_codon / check_escape for SNV / indel / exitron / alt-splicing variants, exactly as the fusion path already does.

Recommended: (b) — one uniform NMD path across all variant sources: transparent, consistent with the fusion path, and it produces the explicit escape rule (not just a yes/no).

Escape rules to cover

check_escape already encodes some of these; the standard set is:

  1. PTC in the last exon (no downstream exon-exon junction).
  2. PTC within ~50–55 nt of the last exon–exon junction.
  3. Start-proximal PTC (close to the start codon → translation re-initiation).
  4. Single-exon / intronless transcript.
  5. (optional) long last exon.

The NMD_escape_rule codes are currently bare numbers — document which code maps to which rule.

Scope / acceptance

  • NMD / NMD_escape_rule populated for SNV, indel, exitron and alt-splicing variants — not just fusions.
  • A documented escape-rule code set.
  • Verify on a real run that the NMD columns are filled for the non-fusion variant types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions