AI Knowledge Hub is a local and cloud-integrated automation environment that combines OpenWebUI, Groq, Firecrawl, and Ollama for intelligent data scraping, knowledge management, and AI-assisted automation.
This project leverages the Model Context Protocol (MCP) framework to enable AI models to directly interact with external tools and APIs.
Through Groq’s MCP-compatible Responses API and Firecrawl’s MCP server, it automates the retrieval, summarization, and organization of AI-related data into a synchronized knowledge base accessible through OpenWebUI.
The purpose of this project is to provide a unified, modular system for:
- Running OpenWebUI locally with GPU acceleration through NVIDIA Docker.
- Integrating Groq API for high-speed, reasoning-capable LLM inference.
- Automating web scraping and summarization tasks via Firecrawl API.
- Organizing all collected data into a local directory for OpenWebUI knowledge sync.
- Enabling both local and cloud-based AI models to cooperate seamlessly in one environment.
This setup is ideal for research, AI knowledge aggregation, and experimentation with LLM-powered automation.
- Automated Data Scraping
harvest_ai_models.js: Scrapes trending AI models from Hugging Face.harvest_reddit_ai.js: Scrapes AI-related subreddit discussions.harvest_investor_news_firecrawl.js: Scrapes finance & investment news and market sentiments.
- Persistent Output Management
- Organized directories under
/outputsfor each data source. - Optional live mount to OpenWebUI’s
/uploadsfolder for automatic sync.
- Organized directories under
- Multi-Model Support
- Local inference with Ollama (e.g.,
gemma3:4borllama3:8b). - Cloud inference through Groq (e.g.,
gpt-oss-120b,llama-3.1-70b).
- Local inference with Ollama (e.g.,
- GPU Acceleration
- Integrated NVIDIA driver support for local LLM performance.
- Containerized Environment
- Single
docker-compose.ymlhandles build, run, and persistence.
- Single
- Safe Configuration
.envfile manages API keys and model configuration.
-
Finance News Summary from InvestorNe - MCP webscrapping with groq GPT-OSS-120b & Firecrawl, summarized by Llama3.3-70b base model locally with knowledge/system prompts
-
Huggingface.co and AI & LLM Sub-reddit new posts summary and reasoning by customized Llama3.3-70b based AI Researcher model.
-
MCP prompt -> scrape -> summarize automation in action
ai_knowledge_hub/
│
├── docker-compose.yml # Docker setup for OpenWebUI with GPU + mounts
├── harvest_ai_models.js # Scraper for Hugging Face trending models
├── harvest_reddit_ai.js # Scraper for Reddit AI communities
├── package.json # Node dependencies for scraper scripts
├── .env # Environment variables (not committed)
├── .env.example # Example environment file for setup
├── outputs/ # Auto-generated data folder
│ ├── RedditAI/
│ └── AI_Models/
└── README.md # Project documentation
git clone https://github.com/hieudku/ai-knowledge-hub.git
cd ai-knowledge-hubDuplicate the example .env file and edit your API credentials:
cp .env.example .envThen open .env and insert your actual keys:
GROQ_API_KEY=your_groq_api_key_here
FIRECRAWL_API_KEY=your_firecrawl_api_key_here
OLLAMA_API_BASE_URL=http://host.docker.internal:11434
DEFAULT_MODEL=choose_your_own_local_model
ENABLE_MCP_SERVERS=true
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utilitydocker compose up -dOnce running, visit:
http://localhost:3000
You’ll now have OpenWebUI running with GPU acceleration, Ollama local connection, and Groq + Firecrawl integrations available for scraping and model inference.
The Node.js scripts automate data scraping and feed outputs to OpenWebUI’s mounted knowledge path.
node harvest_ai_models.jsnode harvest_reddit_ai.jsAll scraped data will be saved automatically into:
outputs/AI_Models/
outputs/RedditAI/
If your Docker container mounts this folder (as configured in docker-compose.yml), the files will appear automatically inside OpenWebUI’s Knowledge → Sync Directory for instant indexing.
Example .env variables:
# API Integrations
GROQ_API_KEY=
FIRECRAWL_API_KEY=
# Local Ollama Configuration
OLLAMA_API_BASE_URL=http://host.docker.internal:11434
DEFAULT_MODEL=gemma3:4b
# External MCP Integration
ENABLE_MCP_SERVERS=true
# GPU Configuration
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
Your docker-compose.yml handles all service and volume mounting:
version: "3.9"
services:
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
restart: unless-stopped
ports:
- "3000:8080"
env_file:
- .env
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- ollama:/root/.ollama
- openwebui_data:/app/backend/data
- C:/Users/hieuc/ai_knowledge_hub/outputs:/app/backend/data/uploads/mounted-scrapping
volumes:
ollama:
openwebui_data:AI Knowledge Hub integrates multiple systems:
| Component | Role | Description |
|---|---|---|
| OpenWebUI | Frontend interface | Chat UI for local and external LLMs |
| Ollama | Local inference | Runs small to medium-sized open models locally |
| Groq API | Cloud inference | Provides ultra-fast reasoning models (e.g., GPT-OSS-120B) |
| Firecrawl | Web scraping API | Automates extraction and summarization of web content |
| Harvest Scripts | Automation layer | Fetches and saves AI content to /outputs/ |
| Docker Compose | Deployment orchestrator | Builds and runs the system with GPU and persistent volumes |
- Launch OpenWebUI using Docker Compose.
- Run scraping scripts (
harvest_ai_models.jsorharvest_reddit_ai.js). - Files will appear in the mounted
/uploads/folder inside the container. - Open OpenWebUI → Knowledge Tab → Sync or refresh your directory.
- Query your LLM (Groq, Ollama, etc.) with up-to-date contextual knowledge.
- Do not commit your
.envfile. - Create and share only
.env.examplewith placeholder keys. - Review and restrict your API usage limits on Groq and Firecrawl dashboards.
- If exposing your instance publicly, configure authentication in OpenWebUI settings.
- Regularly back up your
outputs/directory if you rely on long-term data accumulation.
| Issue | Possible Cause | Solution |
|---|---|---|
| Container fails to start | Missing NVIDIA runtime or GPU driver | Install nvidia-container-toolkit |
| Ollama Network Problem | Wrong host binding | Ensure OLLAMA_API_BASE_URL uses http://host.docker.internal:11434 |
| Harvest scripts fail | Missing or expired API keys | Verify .env file values |
| No scraped data in OpenWebUI | Mount path mismatch | Ensure host outputs/ is mapped to /app/backend/data/uploads |
This project is released for educational and research use only.
You may modify or extend it freely for local AI development or internal tooling.
All dependencies are subject to their respective licenses.
Developed and maintained by Hieu Cu.
Combines open-source components from the OpenWebUI, Ollama, Groq, and Firecrawl ecosystems to create a fully modular AI research and data automation hub.


