This project is a Retrieval-Augmented Generation (RAG) system built using LangChain and FastAPI to generate context-aware responses from custom documents.
- Document ingestion and chunking
- Semantic search using vector embeddings (Pinecone)
- FastAPI-based REST API for querying
- Dockerized setup
- Basic monitoring integration
- Python
- FastAPI
- LangChain
- Pinecone
- Docker
User Query → FastAPI → Retriever → LLM → Response
POST /query
Request: { "query": "What is this document about?" }
Response: { "answer": "..." }
- Install dependencies
- Run FastAPI server
- Send request to /query
- Hybrid search
- Response caching
- Evaluation metrics
