Matthew's Personal Codex Agent 🤖

A production-minded, retrieval-augmented generation (RAG) agent that allows users to query a personal knowledge base across multiple conversational styles. Built on Next.js 15 App Router, React 19, and Tailwind CSS, this system implements a clean document processing pipeline and offers a complete, zero-dependency Portfolio Demo Mode for safe public showcase.

🔗 Live Demo: matthew-schramm-codex-agent.vercel.app
🔗 Portfolio: matthew-schramm-portfolio.onrender.com
🔗 LinkedIn: linkedin.com/in/matthew-schramm-476523253

📖 Project Overview

Matthew's Codex bridges the gap between raw document sets (PDFs, Markdown, text files) and contextual chat. Instead of relying on generic LLM queries, it uses semantic retrieval to ground answers strictly in Matthew’s personal profile, experience, work-style guidelines, and academic papers.

Why This is Technically Sophisticated

Decoupled Architecture: Features a local-first simulation environment (Demo Mode) that maps exactly to production API boundaries. Recruiters can test the full interface, upload files, simulate vector database injection, and query documents without setting up API keys, Pinecone indexes, or paying for tokens.
Contextual Persona Prompts: Conversational behavior shifts dynamically across five modes (Interview, Story, TL;DR, Humble Brag, Self-Reflection) using prompt preambles while keeping the under-the-hood vector retrieval logic uniform.
Change-Aware Ingestion Pipeline: Ingestion script scans the local document folder and uses MD5 hashing to run incremental updates—only embedding changed files, saving on API costs and execution times.
Source Attribution & Citation: Chat bubble answers display clickable source chips that map directly back to the source documents retrieved from the vector index, providing auditability for AI-generated answers.

🏗️ System Architecture

The following diagram illustrates how documents are processed in the ingestion pipeline and how queries are dynamically routed based on the environment configuration (Production vs. Demo Mode).

flowchart TD
    subgraph IngestionPipeline ["Ingestion Pipeline (scripts/ingest.ts)"]
        Doc[Raw Files: PDF, MD, TXT] --> Hash{Has file changed?\nMD5 Hash Check}
        Hash -- No --> Skip[Skip File]
        Hash -- Yes --> Chunk[Unified Parser & Chunking\n1200 chars / 200 overlap]
        Chunk --> Embed[OpenAI Embedding\ntext-embedding-3-small]
        Embed --> Upsert[Pinecone Upsert\n1536-dim vector]
    end

    subgraph ChatRetrieval ["Chat Retrieval & Interface (src/app/api/chat)"]
        UserQuery[User Query + Selected Mode] --> DemoCheck{isDemoMode?}
        
        DemoCheck -- Yes (Demo Mode) --> LocalFixture[Local Seeds\ndemo-data.ts]
        LocalFixture --> OutputDemo[Simulated Response\n+ Seeded Sources]
        
        DemoCheck -- No (Production) --> EmbedQuery[Query Embedded]
        EmbedQuery --> VectorQuery[Pinecone Vector Search]
        VectorQuery --> Context[Context Construction\nTop-5 Chunks]
        Context --> SystemPrompt[System Prompt + Mode Preamble]
        SystemPrompt --> LLM[OpenAI GPT-4o-mini]
        LLM --> OutputProd[Response with Source Citations]
    end

✨ Key Features

Multi-Mode Conversations: Swap styles seamlessly (e.g. professional Interview answers, introspective Self-Reflection, narrative Story logs, or quick TL;DR lists).
Source Attribution: Visual chips showing which documents (resumes, academic transcripts, work-style documents) were referenced.
Administrative Interface (/admin): A fully functional admin dashboard to upload documents, review the current file repository, and run ingestion updates.
Robust CLI Tools: Development tools for clear force-reprocessing (npm run ingest:clear), dry-runs (npm run ingest:dry), and status checks (npm run dataset:status).
Clean UI/UX: Custom Tailwind CSS animations, Radix UI layout elements, and fully responsive layouts.

🔧 Local Development & Setup

You can run the application in either Portfolio Demo Mode (zero setup required) or Production Mode (connected to your own OpenAI and Pinecone accounts).

Prerequisites

Node.js v20+
npm

Quick Start (Demo Mode)

To run the app immediately with local seed data and mock API boundaries:

Setup Environment:
```
cp .env.example .env.local
```
(Ensure DEMO_MODE=true and NEXT_PUBLIC_DEMO_MODE=true are set inside .env.local)
Install & Run (using make):
```
make setup
make demo
```
Alternatively, run npm install && npm run dev.
Open http://localhost:3000 to chat, or visit http://localhost:3000/admin to explore the simulated dataset manager.

Running in Production Mode

To connect the application to real vector stores and AI models:

Configure API Keys: Edit .env.local to disable demo mode and add your production keys:

DEMO_MODE=false
NEXT_PUBLIC_DEMO_MODE=false
OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=your_index_name

Add Your Documents: Place your personal documents (PDF, Markdown, or TXT) inside src/data/.
Ingest the Knowledge Base: Run the ingestion script to parse, embed, and upsert vectors into Pinecone:
```
make ingest
```
(Use make ingest-dry to run a preview of what chunks would be created before hitting OpenAI/Pinecone).
Launch Application:
```
make dev
```

🛠️ Makefile Commands

The included Makefile provides short, standard targets for development and administration:

Command	Action
`make setup`	Installs project dependencies and copies the `.env.example` template
`make dev`	Starts the Next.js development server in production mode
`make demo`	Starts the development server in offline, seed-backed Portfolio Demo Mode
`make ingest`	Runs the document ingestion script to process and embed new files
`make ingest-dry`	Previews document chunking and metadata generation without uploading to Pinecone
`make ingest-clear`	Clear the existing Pinecone index vectors for files being re-ingested
`make build`	Builds the production bundle
`make lint`	Validates TypeScript and ESLint standards

🧠 Engineering Decisions & Tradeoffs

Legacy PDF Parsing in Node: Node.js environments often struggle with native client-side PDF readers due to canvas and DOM dependencies. We resolved this by employing the pdfjs-dist/legacy/build/pdf.mjs loader directly in scripts/ingest.ts, allowing unified local PDF parsing without requiring OS-level binaries.
Permissive Relevance Thresholds: The retrieval threshold in src/app/api/chat/route.ts is tuned to 0.2 rather than 0.7 to ensure conversational flow remains warm and informative for resumes and cover letters, falling back gracefully to general conversational modes rather than failing abruptly on minor semantic mismatches.
Decoupled API Contract: By implementing the isDemoMode check directly inside the API handlers (/api/chat, /api/upload, /api/ingest, /api/dataset), we preserve the React client-side async fetch states exactly as they would operate on a live site, proving UI integrity and API layout compliance.

📊 Project Status & Position

This project is an Active Showcase / Portfolio Demo. It represents modern engineering practices in full-stack Next.js design, RAG implementation, security compliance, and developer convenience.

For comments, feedback, or networking, please contact Matthew at mattschramm1235@gmail.com or visit the Portfolio.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
public		public
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.ingest-tracking.json		.ingest-tracking.json
Makefile		Makefile
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
test-api.ps1		test-api.ps1
test-api.sh		test-api.sh
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matthew's Personal Codex Agent 🤖

📖 Project Overview

Why This is Technically Sophisticated

🏗️ System Architecture

✨ Key Features

🔧 Local Development & Setup

Prerequisites

Quick Start (Demo Mode)

Running in Production Mode

🛠️ Makefile Commands

🧠 Engineering Decisions & Tradeoffs

📊 Project Status & Position

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Matthew's Personal Codex Agent 🤖

📖 Project Overview

Why This is Technically Sophisticated

🏗️ System Architecture

✨ Key Features

🔧 Local Development & Setup

Prerequisites

Quick Start (Demo Mode)

Running in Production Mode

🛠️ Makefile Commands

🧠 Engineering Decisions & Tradeoffs

📊 Project Status & Position

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages