Nils Matteson matteso1

Nils Matteson

Senior @ UW-Madison, CS + Data Science (graduating May 2026). Moving to the Bay Area in fall 2026 to start a Masters in CS @ Northeastern. Solo founder of thaw — building the fork primitive for live LLM inference. Open-source, pip-installable, integrates with vLLM and SGLang. Background in ML reliability (uncertainty quantification, OOD detection, conformal prediction) and systems work on the side.

Some things I've built:

thaw — snapshot a running LLM session (weights + KV cache + scheduler state + prefix-hash table) and hydrate it into N divergent children that skip prefill and diverge from the fork point. Hero receipt: ForkPool init 22.3s one-time → 0.88s median per round across 5 rounds × 4 branches × 64 tokens on H100 + Llama-3.1-8B, all bit-identical at the fork boundary. 70B TP=2 sleep/wake on 2×H100 also bit-identical, 145 GiB freed via CuMemAllocator. ~9,400 LOC Rust core across 5 crates + ~3,000 LOC Python; now also runs on Apple Silicon (MLX). On PyPI as thaw-vllm / thaw-native.
ProjectGorgon — Medusa-style speculative decoding for Llama-3-8B with custom CUDA kernels. 5-head architecture, per-head loss weighting (λ_k = 0.8^k), trained on UltraChat 200k. Taught myself GPU programming for this; led directly to thaw.
madison-bus-eta — live. Real-time arrival predictions for all 29 Madison Metro routes. XGBoost on 47 features, Mondrian conformal prediction (90% coverage guarantee), nightly retraining with deployment gates, DeckGL map, bus-bunching detection. 35% more accurate than the official API.
KohakuRAG_UI — RAG system for AI sustainability Q&A, deployed on AWS Bedrock. Built for UW's Research Cyberinfrastructure group.
sentinel — Kafka-inspired distributed log streaming engine in Go. Custom LSM-tree storage, skip list memtable, Raft consensus, gRPC wire protocol. 1.7M ops/sec on the skip list.
lockbox — zero-trust, air-gapped password manager. AES-256-GCM + Argon2id, TOTP 2FA, compiles to a single .exe. No cloud, no telemetry, no network calls.

Talk: Deploying RAG in Bedrock vs. Local — ML+X Forum, Feb 2026.

Currently full-time on thaw. Open to conversations with people working on inference infrastructure, agent systems, LLM RL post-training, or anything composability-related at the GPU-state layer. Reach me at nils@thaw.sh.

thaw.sh · linkedin · github

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nils Matteson matteso1

Achievements

Achievements

Organizations

Block or report matteso1

Nils Matteson

Pinned Loading

Uh oh!