Skip to content
@thaw-ai

thaw

building open source GPU state management for LLM inference. 17x faster cold starts via pipelined DMA, KV cache snapshots, multi-GPU tensor parallel.

Popular repositories Loading

  1. thaw thaw Public

    Fast snapshot/restore for LLM inference. 8x faster cold starts, multi-GPU tensor parallel, KV cache snapshots.

    Python 3

  2. vllm vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

Repositories

Showing 2 of 2 repositories

Top languages

Loading…

Most used topics

Loading…