ChizuCode

ChizuCode is a codebase exploration and teaching assistant. It turns a GitHub repository into an interactive architecture map, explains what each file does, shows how files connect, and lets users ask questions about the code with source-backed answers.

The Problem

Understanding an unfamiliar codebase is slow. New contributors usually have to jump between folders, read files out of order, guess domain boundaries, and ask teammates for context before they can safely make changes.

This is especially hard when:

The repository has many small files spread across folders.
Important logic crosses frontend, backend, config, and auth layers.
Documentation is missing, stale, or too high level.
A learner needs to understand "how this system works" before reading implementation details.

The Solution

ChizuCode ingests a repository once, builds a semantic map of the project, and gives users a visual way to learn the codebase.

It provides:

A graph of high-level domains and file-level components.
Plain-English summaries for files and clusters.
Connections between related files.
Scoped RAG questions over the repository.
Workflow showcase mode that animates how a process moves through the graph.

Instead of searching file-by-file, users can start from the architecture view, drill into a submap, inspect file responsibilities, and ask targeted questions about the current area.

How It Works

1. Repository Ingestion

The backend receives a GitHub URL and runs a one-time ingestion pipeline:

Clone the repository.
Walk supported source, config, and documentation files.
Split large files into chunks.
Generate summaries with Gemini.
Create semantic embeddings and code embeddings.
Cluster files into meaningful domains.
Persist chunks, vectors, domains, and the cluster tree in Postgres with pgvector.

Once a repository is ready, later loads are much faster because the graph and vectors are already stored in the database.

2. Graph Exploration

The frontend converts the backend cluster tree into an interactive graph. Users can move from a domain-level map into file-level submaps, inspect responsibilities, and follow edges between related files.

3. RAG Questions

Users can ask natural language questions about the repository. The backend embeds the question, retrieves relevant chunks using pgvector similarity search, and asks Gemini to synthesize an answer with sources.

Queries can be scoped to the current cluster so answers stay relevant to the area the user is viewing.

4. Workflow Showcase

Workflow mode uses a separate backend endpoint to return animation payloads. The graph can navigate to the right submap, focus on relevant files, expand internal components, and animate a process through the system.

Impact

ChizuCode helps developers, students, and teams understand code faster.

It can reduce onboarding time by:

Making project structure visible.
Explaining file purpose before users read implementation.
Showing relationships between modules.
Providing source-backed answers instead of generic chatbot responses.
Turning code review or mentorship into a guided visual walkthrough.

For hackathons, classrooms, open-source onboarding, and internal teams, ChizuCode acts like a teaching assistant for any repository.

Tech Stack

Frontend

Next.js 16
React 19
TypeScript
Tailwind CSS
react-force-graph-2d
Framer Motion
Lucide React

Backend

FastAPI
Python
PostgreSQL
pgvector
psycopg2
Gemini API
Voyage AI embeddings
scikit-learn clustering

Deployment

Frontend: Vercel
Backend: DigitalOcean App Platform
Database: Neon Postgres with pgvector

Backend Architecture

Main backend modules:

backend/main.py - FastAPI application and router registration.
backend/routers/ingest.py - repository ingestion endpoints.
backend/routers/query.py - RAG question endpoint.
backend/routers/workflow.py - workflow animation endpoint.
backend/services/pipeline.py - ingestion orchestrator.
backend/services/embedder.py - summaries and embeddings.
backend/services/clusterer.py - hierarchical clustering and labeling.
backend/services/rag.py - retrieval and answer synthesis.
backend/services/workflow.py - deterministic workflow animation payloads.
backend/db/database.py - database schema and queries.

Core API endpoints:

GET  /health
POST /repo
GET  /repo/{repo_id}
GET  /repo/{repo_id}/graph
POST /repo/{repo_id}/query
POST /repo/{repo_id}/workflow

Local Setup

Install frontend dependencies:

npm install

Install backend dependencies:

pip install -r backend/requirements.txt

Create a local Postgres database and schema:

.\.venv\Scripts\python.exe backend/db/init_local_db.py --database codex --user postgres --password "your_password"

Set local backend environment variables:

$env:DATABASE_URL="postgresql://postgres:your_password@localhost:5432/codex"
$env:GEMINI_API_KEY="your_gemini_key"
$env:VOYAGE_API_KEY="your_voyage_key"

Set local frontend environment in .env.local:

NEXT_PUBLIC_API_URL=http://127.0.0.1:8000

Run the backend:

uvicorn backend.main:app --reload

Run the frontend:

npm run dev

Open:

http://localhost:3000

Production Environment Variables

Vercel Frontend

NEXT_PUBLIC_API_URL=https://your-digitalocean-backend-url

DigitalOcean Backend

DATABASE_URL=postgresql://USER:PASSWORD@NEON_HOST/neondb?sslmode=require
GEMINI_API_KEY=your_gemini_key
VOYAGE_API_KEY=your_voyage_key

Optional Gemini key rotation for faster ingestion:

GEMINI_API_KEY_1=...
GEMINI_API_KEY_2=...
GEMINI_API_KEY_3=...

Optional ingestion tuning:

INGEST_BATCH_SIZE=40
GEMINI_GENERATE_TIMEOUT_SECONDS=45
GEMINI_EMBED_TIMEOUT_SECONDS=45
VOYAGE_EMBED_TIMEOUT_SECONDS=45

Deployment Notes

The backend includes a Dockerfile at:

backend/Dockerfile

For DigitalOcean App Platform:

Source directory: backend
Dockerfile path: Dockerfile
HTTP port: 8080
Run command: uvicorn backend.main:app --host 0.0.0.0 --port 8080

Initialize the production database schema once:

.\.venv\Scripts\python.exe backend/db/init_db.py

Use --reset only when intentionally wiping app tables.

Status

ChizuCode currently supports repository ingestion, graph generation, scoped RAG answers, and workflow animation payloads. The system is designed for fast demos once a repo has already been ingested and cached in the database.

Team

Naseer Rehman
Seydi Cheikh Wade
Tri An
Tin Mainiawklang

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.claude		.claude
.idea		.idea
app		app
backend		backend
components/GraphViewer		components/GraphViewer
lib		lib
public		public
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
test.json		test.json
tsconfig.json		tsconfig.json
workflow_test.json		workflow_test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChizuCode

The Problem

The Solution

How It Works

1. Repository Ingestion

2. Graph Exploration

3. RAG Questions

4. Workflow Showcase

Impact

Tech Stack

Frontend

Backend

Deployment

Backend Architecture

Local Setup

Production Environment Variables

Vercel Frontend

DigitalOcean Backend

Deployment Notes

Status

Team

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChizuCode

The Problem

The Solution

How It Works

1. Repository Ingestion

2. Graph Exploration

3. RAG Questions

4. Workflow Showcase

Impact

Tech Stack

Frontend

Backend

Deployment

Backend Architecture

Local Setup

Production Environment Variables

Vercel Frontend

DigitalOcean Backend

Deployment Notes

Status

Team

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages