A local tool for testing how your chunking, embedding and retrieval choices affect search recall, before you lock a strategy into production.
Drop in a document, define one or more experiments (a chunking strategy, an embedding model and a retrieval config), run a shared set of queries, and see side by side which chunks come back, how relevant they are, and the recall numbers that matter (Recall@k, MRR, nDCG, hit rate) next to cost and latency.
The full product and architecture plan lives in rag-plan/.
M0, Foundations and skeleton: done. One command (assay serve) launches a dashboard that
talks to the backend. The plugin architecture is in place (registries plus a Pydantic
ExperimentConfig). You can create a project, import a .txt or .md, and read its text
back with character offsets and structure. The backend is clean on lint, types and tests, and
the frontend type-checks and builds.
Next up is M1, the Chunking Lab: real chunkers with a live span overlay, and a PyMuPDF PDF loader.
You need Python 3.10 or newer and uv. The frontend uses bun.
# Backend: create the venv and install, dev tools included
uv sync --extra dev
# Run the checks
uv run ruff check .
uv run mypy
uv run pytest
# Frontend (you can skip this while working on the backend)
cd frontend && bun install && bun run build # output goes to ../src/assay/web/static
# Launch the dashboard
uv run assay serve # http://127.0.0.1:7777src/assay/
core/ the reusable engine, no web dependency: types, registry, config, loaders
store/ SQLite metadata (SQLModel) and content-addressed blob storage
api/ FastAPI app, routers, static serving
cli.py the `assay serve` command
frontend/ Vite + React + TS + Tailwind dashboard
tests/ unit and integration tests