A full‑stack multi‑tenant Retrieval‑Augmented Generation (RAG) platform that allows teams to upload documents (PDF, DOCX, Images), index them into a vector database, and ask questions strictly based on the uploaded content — with streaming AI responses.
This project is built as a real‑world, production‑style system with authentication, file ingestion, OCR fallback, vector search, and LLM integration.
-
🔐 Authentication & Multi‑Tenancy (team‑based isolation)
-
📤 File Upload Support
- PDF (text + OCR fallback)
- DOCX
- Images (PNG / JPG via OCR)
-
🧠 RAG Pipeline (Retrieve → Augment → Generate)
-
📦 Vector Database with Qdrant
-
🤖 Local LLM via Ollama (LLaMA 3)
-
⚡ Streaming AI Responses (token‑by‑token)
-
🧾 Context‑only Answers (No Hallucination)
-
💬 Modern Chat UI (Markdown + streaming cursor)
- Next.js 14 (App Router)
- React
- TypeScript
- Tailwind CSS
- React Markdown
- Axios
- React Hot Toast
- Node.js
- Express.js
- TypeScript
- Multer (file uploads)
- JWT Authentication
- Axios
- Ollama (LLaMA 3 for generation)
- Qdrant (vector database)
- OCR: Tesseract.js
- DOCX Parsing: Mammoth
- PDF Handling: Poppler (pdftoppm)
Frontend (Next.js)
↓
Express API (Auth + Upload + Chat)
↓
Text Extraction (PDF / DOCX / OCR)
↓
Embedding Generation
↓
Qdrant Vector Store (per team)
↓
Context Retrieval
↓
Ollama (LLaMA 3)
↓
Streaming Answer → Frontend
backend/
├─ src/
│ ├─ controllers/
│ │ ├─ ingest.controller.ts
│ │ └─ ask.controller.ts
│ ├─ routes/
│ │ ├─ ingest.routes.ts
│ │ └─ ask.routes.ts
│ ├─ utils/
│ │ ├─ rag.ts
│ │ ├─ qdrant.ts
│ │ └─ token.ts
│ ├─ app.ts
│ └─ server.ts
├─ uploads/
└─ .env
frontend/
├─ app/
│ └─ chat/page.tsx
└─ lib/api.ts
Create a .env file in backend/:
PORT=4000
JWT_SECRET=your_secret_key
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=docs
OLLAMA_URL=http://localhost:11434Make sure you have:
- Node.js 18+
- Ollama installed
- Qdrant running
- Poppler installed (for PDF → image OCR)
- Download Poppler
- Add
bin/folder to PATH - Verify:
pdftoppm -hrun qdrant.exe(or use the official binary)
ollama pull llama3
ollama servecd backend
npm ci
npm run devBackend runs on: http://localhost:4000
cd frontend
npm ci
npm run devFrontend runs on: http://localhost:3000
- Endpoint:
POST /upload - Auth required (JWT)
- Form‑Data key:
file
Supported formats:
.pdf.docx.png,.jpg,.jpeg
Each document is:
- Parsed / OCR‑ed
- Converted into text
- Embedded
- Stored in Qdrant with team isolation
- Endpoint:
POST /chat/ask - Request body:
{
"question": "What skills are mentioned in the resume?"
}The model is instructed to:
- ❌ Never use outside knowledge
- ❌ Never guess
- ✅ Answer only from retrieved context
- ✅ Say "Not found in the context" if missing
- Backend streams tokens using
res.write() - Frontend reads via
ReadableStream - Animated inline loader appears during streaming
🚨 Answer within the context:
📌 Skills
• JavaScript, TypeScript
• React, Next.js
• Node.js, Express
• Tailwind CSS
- JWT‑based authentication
- Each document stored with
teammetadata - Qdrant filters ensure no cross‑team data leaks
- ✅ Real embedding model (nomic‑embed‑text)
- 🔍 Chunking & overlap
- 🗂️ Document management UI
- 📊 Token usage stats
- 🌍 Cloud deployment
Aryan Gawade
This project demonstrates real‑world RAG architecture, AI streaming UX, and production‑ready backend patterns — suitable for internships, final‑year projects, and portfolios.
If you like it, ⭐ the repo!