Book2Vision

Book2Vision is an automated system that transforms digital books (PDF, EPUB, TXT) into complete multimedia packages — audiobooks, visual scene galleries, character portraits, AI podcasts, interactive storybooks, video summaries, and intelligent Q&A.

Features

Multi-Format Ingestion — Supports PDF (including scanned via OCR), EPUB, and TXT formats
Semantic Analysis — Extracts characters, scenes, themes, and relationships using Gemini AI
Audiobook Generation — Full TTS audiobooks with multiple provider support (ElevenLabs, Deepgram, Edge TTS)
Visual Scene Gallery — AI-generated illustrations for key scenes with carousel viewer
Character Portraits — Consistent character art with reference sheets and multiple art styles
AI Podcast — Multi-voice conversational podcasts discussing the book
Interactive Storybook — Illustrated page-by-page storybook adaptation
Video Generation — Animated scene videos from generated illustrations
Q&A Chatbot — Ask questions about the book with AI-powered answers
Book Library — Persistent library with cover art, metadata, and reload support
Download Bundle — Export all generated content as a ZIP package

Tech Stack

Layer	Technology
Backend	Python, FastAPI, Uvicorn
AI/NLP	Google Gemini, DeepSeek, Spacy
Audio	ElevenLabs, Deepgram, Edge TTS
Image Gen	Pollinations AI, DeAPI
Document Processing	PyPDF2, EbookLib, Tesseract OCR
Frontend	Vanilla HTML/CSS/JS (Glassmorphism UI)
Database	SQLModel (SQLite)

Setup

Clone the repository:

git clone https://github.com/your-username/book2vision.git
cd book2vision

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install Tesseract OCR (required for scanned PDFs):
- Windows: Download from UB Mannheim
- macOS: brew install tesseract
- Linux: sudo apt-get install tesseract-ocr
Download the Spacy language model:
```
python -m spacy download en_core_web_sm
```

Create a .env file in the root directory (see .env.example):

GEMINI_API_KEY=your_key_here
ELEVENLABS_API_KEY=your_key_here
DEEPSEEK_API_KEY=your_key_here
DEAPI_API_KEY=your_key_here
DEEPGRAM_API_KEY=your_key_here

How to Run

Start the server:
```
python src/server.py
```
Open your browser and navigate to http://localhost:8000
Upload a book to start the transformation.

Project Structure

book2vision/
├── src/
│   ├── server.py          # FastAPI app + route setup
│   ├── routers/           # API route modules
│   │   ├── upload.py      # Book upload & ingestion
│   │   ├── generation.py  # Audio, visuals, portraits
│   │   ├── content.py     # Story, Q&A, podcast, storybook
│   │   └── library.py     # Library CRUD operations
│   ├── models.py          # Pydantic request models
│   ├── state.py           # App state & shared config
│   ├── ingestion.py       # Document parsing
│   ├── analysis.py        # Semantic analysis
│   ├── audio.py           # TTS audio generation
│   ├── visuals.py         # Image generation
│   ├── knowledge.py       # Q&A and quizzes
│   ├── podcast.py         # Podcast generation
│   ├── storybook.py       # Storybook generation
│   ├── video.py           # Video generation
│   ├── library.py         # Library manager
│   └── config.py          # Environment config
├── web/                   # Frontend (HTML/CSS/JS)
├── tests/                 # Test suite
├── scripts/               # Dev utilities & debug scripts
└── requirements.txt

License

This project is for educational and personal use.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
scripts		scripts
src		src
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
PROJECT_OVERVIEW.md		PROJECT_OVERVIEW.md
Procfile		Procfile
README.md		README.md
diag.py		diag.py
fix_dashboard.py		fix_dashboard.py
patch_layout.py		patch_layout.py
pipeline.md		pipeline.md
render.yaml		render.yaml
requirements.txt		requirements.txt
test_deapi_key.py		test_deapi_key.py
test_integration.py		test_integration.py
test_pollinations_tts.py		test_pollinations_tts.py
test_pollinations_tts_post.py		test_pollinations_tts_post.py
test_sampler.py		test_sampler.py
voice_clone_server.py		voice_clone_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book2Vision

Features

Tech Stack

Setup

How to Run

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Book2Vision

Features

Tech Stack

Setup

How to Run

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages