Skip to content

Commit aa22944

Browse files
committed
readme
1 parent 2c33c35 commit aa22944

2 files changed

Lines changed: 145 additions & 9 deletions

File tree

README.md

Lines changed: 144 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,145 @@
1-
# overmind
1+
# Overmind
22

3-
Library to hold PyTorch models in shared memory (and transferring to other processes later, to speed up model loading)
3+
**Cut PyTorch model loading time from 15s to 0.2s with zero-copy shared memory caching.**
4+
5+
[![PyPI version](https://badge.fury.io/py/overmind-cache.svg)](https://badge.fury.io/py/overmind-cache)
6+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
7+
8+
Overmind is a non-intrusive caching library that dramatically speeds up PyTorch model loading by storing serialized models in shared memory. Once a model is loaded, subsequent loads from any process take milliseconds instead of seconds.
9+
10+
Named after the [Overmind from StarCraft](https://starcraft.fandom.com/wiki/Overmind), it coordinates model caching across processes like the Overmind coordinates the Zerg Swarm.
11+
12+
Note that the package name on PyPI is `overmind-cache`, since `overmind` is taken.
13+
14+
## Features
15+
16+
- **Fast model loading** - First load caches to shared memory; subsequent loads are ~5x faster
17+
- **Process-agnostic** - Cache persists across process restarts via a background server
18+
- **Non-intrusive** - Just add one line of code; no changes to model loading logic
19+
- **Memory efficient** - Multiple processes share the same cached tensors in memory
20+
- **Broad compatibility** - Works with `diffusers`, `transformers`, `bitsandbytes` quantization, and vanilla `torch.load`
21+
22+
## Installation
23+
24+
```bash
25+
pip install overmind-cache
26+
```
27+
28+
Or install from source:
29+
30+
```bash
31+
git clone https://github.com/taichi-dev/overmind.git
32+
cd overmind
33+
pip install -e .
34+
```
35+
36+
## Quick Start
37+
38+
### Option 1: Monkey Patching (Recommended)
39+
40+
Add a single line at the top of your script to automatically accelerate all supported model loading:
41+
42+
```python
43+
import overmind.api
44+
overmind.api.monkey_patch_all()
45+
46+
# Your existing code works unchanged!
47+
from diffusers import DiffusionPipeline
48+
49+
pipeline = DiffusionPipeline.from_pretrained(
50+
"stabilityai/stable-diffusion-xl-base-1.0",
51+
torch_dtype=torch.float16
52+
)
53+
pipeline.to('cuda')
54+
# First run: ~24s
55+
# Subsequent runs: ~1s (mainly consumed by .to('cuda'))
56+
```
57+
58+
### Option 2: Explicit API
59+
60+
For the ones don't like monkey-patching, use the `load` function directly:
61+
62+
```python
63+
from overmind.api import load
64+
from diffusers import DiffusionPipeline
65+
66+
pipeline = load(
67+
DiffusionPipeline.from_pretrained,
68+
"stabilityai/stable-diffusion-xl-base-1.0",
69+
torch_dtype=torch.float16
70+
)
71+
```
72+
73+
## Supported Libraries
74+
75+
Overmind automatically patches these loading functions:
76+
77+
| Library | Functions |
78+
|------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
79+
| **Diffusers** | `DiffusionPipeline.from_pretrained`, `ModelMixin.from_pretrained`, `SchedulerMixin.from_pretrained`, `FromSingleFileMixin.from_single_file` |
80+
| **Transformers** | `PreTrainedModel.from_pretrained`, `PreTrainedTokenizerBase.from_pretrained`, `AutoProcessor.from_pretrained`, `pipeline` |
81+
| **PyTorch** | `torch.load`, `torch.jit.load` |
82+
| **Safetensors** | `safetensors.torch.load_file` |
83+
| **TorchVision** | `vgg16`, `vgg19` |
84+
| **OpenCLIP** | `create_model_and_transforms` |
85+
86+
### Custom Patch Points
87+
88+
Create an `overmind.cfg` file in your package root to add custom patch points:
89+
90+
```
91+
# overmind.cfg
92+
mylib.models::MyModel.from_pretrained
93+
mylib.utils::load_checkpoint
94+
```
95+
96+
## CLI Commands
97+
98+
```bash
99+
# Start the server manually (usually auto-started)
100+
overmind-server
101+
102+
# Start as daemon
103+
overmind-server --daemon
104+
105+
# List cached models
106+
overmind-list
107+
108+
# Shutdown the server (clears cache)
109+
overmind-shutdown
110+
```
111+
112+
## Environment Variables
113+
114+
| Variable | Description |
115+
|---------------------------|---------------------------------------------------------------------|
116+
| `OVERMIND_DISABLE` | Set to any value to disable Overmind, falling back to a local cache |
117+
| `OVERMIND_NO_LOCAL_CACHE` | Disable local caching too |
118+
119+
120+
## Benchmarks
121+
122+
Loading a Stable Diffusion ControlNet pipeline with VAE, on Linux + Intel i9-11900K + RTX 4090:
123+
124+
Using [demo-vae.py](demo-vae.py) as example:
125+
126+
127+
| Run | `vae` | `depth` | `edge` | `pipeline` | to('cuda') | Total |
128+
|-------------------|-------|---------|--------|------------|------------|-------|
129+
| w/o Overmind | 1.18s | 0.98s | 1.41s | 1.65s | 0.91s | 6.16 |
130+
| w/ Overmind (1st) | 5.44s | 5.17s | 5.41s | 7.29s | 0.86s | 24.20 |
131+
| w/ Overmind (2nd) | 0.00s | 0.01s | 0.01s | 0.20s | 0.87s | 1.12 |
132+
133+
The first load with Overmind is slower due to pickling overhead. Subsequent loads are **5-6x faster** than without Overmind, with the only remaining cost being the `to('cuda')` transfer.
134+
135+
## License
136+
137+
Apache 2.0
138+
139+
## Contributing
140+
141+
Contributions are welcome! Please feel free to submit a Pull Request.
142+
143+
## Acknowledgments
144+
145+
Developed by [Taichi Graphics](https://github.com/taichi-dev) for production AI inference workloads.

demo-vae.py

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,20 +17,16 @@ def load_pipeline():
1717
"lemon2431/ChineseInkComicStrip_v10",
1818
subfolder="vae",
1919
torch_dtype=torch.float16,
20-
device_map='cuda',
2120
))()
22-
print(vae.device)
2321
controlnet_depth = (lambda: ControlNetModel.from_pretrained(
2422
"lllyasviel/control_v11f1p_sd15_depth",
2523
torch_dtype=torch.float16,
2624
variant="fp16",
27-
device_map='cuda',
2825
))()
2926
controlnet_edge = (lambda: ControlNetModel.from_pretrained(
3027
"lllyasviel/control_v11p_sd15_softedge",
3128
torch_dtype=torch.float16,
3229
variant="fp16",
33-
device_map='cuda',
3430
))()
3531

3632
pipeline = (lambda: StableDiffusionControlNetPipeline.from_pretrained(
@@ -39,10 +35,8 @@ def load_pipeline():
3935
vae=vae,
4036
torch_dtype=torch.float16,
4137
safety_checker=None,
42-
device_map='cuda',
4338
))()
4439

45-
# (lambda: pipeline.to('cuda'))()
46-
print(pipeline.device)
40+
(lambda: pipeline.to('cuda'))()
4741

4842
load_pipeline()

0 commit comments

Comments
 (0)