KnowBridge is a lightweight Retrieval-Augmented Generation (RAG) app that turns a set of documents into a grounded, searchable knowledge assistant.
It provides:
- A Gradio Web UI to upload and index
.md,.docx, and.docfiles - Persistent ChromaDB vector storage (no re-embedding every run)
- Semantic retrieval + grounded answering using a Groq-hosted LLM
- Session-based chat persistence (SQLite) with rolling summarization for long chats
KnowBridge is designed to work with user-supplied document collections across domains rather than being tied to a single business function.
Supported usage:
- Upload
.md,.docx, or.docfiles - Ask factual, procedural, or follow-up questions answerable from those documents
- Receive grounded answers with a
SOURCES:line for traceability
Out of scope:
- Questions requiring external knowledge not present in the uploaded documents
- Live data lookups from external systems
- Formal benchmark-grade retrieval evaluation in the current version
- Upload
.md,.docx, or.docfiles in the Knowledge Base tab. - The app hashes each file and only re-indexes files that changed.
- Documents are chunked, embedded with a sentence-transformer model, and stored in ChromaDB.
- In the Chat tab, your question is embedded, relevant chunks are retrieved, and the LLM answers using only retrieved context.
- The assistant appends a
SOURCES:line for traceability and stores chat history in SQLite.
- Python 3.11 recommended
Note: Some dependencies (ChromaDB / PyTorch / sentence-transformers) may not have wheels for very new Python versions yet (e.g. 3.14). If your python3 is too new, use python3.11.
Install all dependencies from this folder’s requirements file:
pip install -r requirements.txtKey packages include: langchain, langchain_groq, chromadb, sentence-transformers, gradio, python-dotenv, pyyaml, python-docx.
This app requires a Groq API key.
Create a .env file in the project root (this folder) with:
GROQ_API_KEY=your-groq-api-keyGet a key from: https://console.groq.com/
From the rag-knowledge-assistant/ directory:
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python code/app.pyThen open the local Gradio URL printed in the terminal (typically http://127.0.0.1:7860).
- App settings (LLM model, retrieval params):
code/config/config.yaml - Prompt rules (grounding,
SOURCES:formatting):code/config/prompt_config.yaml
Common knobs:
vectordb.n_results(top-k retrieval)vectordb.threshold(distance threshold)llm(Groq model name)
- Sample knowledge docs:
data/(sample.mdfiles included) - Vector DB persisted to:
outputs/vector_db/ - Chat history persisted to:
outputs/chat_history.db
- The UI supports indexing
.md,.docx, and.doc. - For
.doc, the app will try to extract text from Confluence/Atlassian export formats (MIME-wrapped HTML) first, then fall back to macOStextutilor LibreOffice (soffice) conversion. - If retrieval returns no chunks under the threshold, the pipeline falls back to returning the top-k results to avoid empty context.
This project is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
- See
LICENSEfor the full text. - Attribution: If you share, reuse, or adapt this project, please provide attribution.
- Credit: Raja (this KnowBridge implementation) and retain the original license notice.