v0.2.1: wire the relay megakernel into from_dem auto-dispatch (GPU-validate the public-API path)

v0.2 merged the megakernel as standalone backend classes (tridec.backends.megakernel.{BpMegaTriton,RelayBpMegaTriton}), validated direct on Metal/CUDA/ROCm with receipts. NOT yet wired into the public from_dem/RelayBpDecoder dispatch — that path still uses the two-kernel RelayBpTriton.

Design (validated finding): on a GPU backend, RELAY should default to the megakernel (9-22x faster, identical correctness); plain BP should STAY two-kernel (the BP megakernel loses at large batch — no early-exit lever). So: RelayBpDecoder(backend='triton'|'metal') -> mega by default; BpDecoder -> two-kernel; expose kernel='mega'|'two-kernel'|'auto' override.

Gate before shipping: re-run the relay gates THROUGH from_dem(...).decode_batch (not just the standalone classes) on a CUDA or ROCm GPU — the dispatch wiring is the untested surface. Needs a GPU session; do not flip the default without it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.1: wire the relay megakernel into from_dem auto-dispatch (GPU-validate the public-API path) #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.2.1: wire the relay megakernel into from_dem auto-dispatch (GPU-validate the public-API path) #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions