Illustrated storybooks from a child's words — not the AI's.
"The sun came down to the ground. It was sad and tired. Children came and threw pink, blue, yellow powder on it like Holi. The sun smiled and went back to the sky." — story by a 7-year-old, May 2026. The illustration above is what StoryCanvas painted around her words.
A local web app that turns a story prompt into an illustrated picture book — text from Gemini 2.5 Flash, illustrations from Gemini 2.5 Flash Image, exported as a printable PDF. Built in a day as a working concept piece.
The interesting part is what it does not do. Most AI-for-kids tools take a child's seed and write a whole new story around it. StoryCanvas defaults to the opposite: it treats the child's words as the story, fixes only spelling and grammar, and lets the AI handle pagination and illustration. The child keeps authorship. The AI is a visual co-creator, not a ghostwriter.
That's the wedge.
A three-position dial on every story:
| Level | Name | What the AI does |
|---|---|---|
| 0 | Faithful (default) | Preserves the child's exact wording. Fixes only spelling and grammar. Splits text into pages at natural breaks. Generates illustrations from what is literally in the text. Adds no new plot, characters, dialogue, or themes. |
| 1 | Light | Keeps the child's wording, plot, and characters. May add brief connective sentences for pacing or make implicit visual details explicit. No new plot events. |
| 2 | Free | Uses the prompt as a creative seed. Invents characters, dialogue, scenes — a fully-developed AI story. (This is how most competing tools work.) |
Prompt (a real story by a 7-year-old): "The sun came down to the ground. It was sad and tired. Children came and threw pink, blue, yellow powder on it like Holi. The sun smiled and went back to the sky."
|
Liberty 0 · Faithful Her words. Untouched.
|
Liberty 2 · Free The AI invents characters and expands.
|
Title on the left: The Sun's Colors — a title the LLM generated for her story, but every sentence on the page is hers. Title on the right: The Day the Sun Needed a Splash of Color — a different story entirely, with named characters (Lily, Tom) the AI made up.
For a child, a teacher, or a parent who wants to preserve a kid's voice, Faithful is the right default. Free is great for "I have a one-line idea, fill it in." Light sits between.
git clone https://github.com/agaonker/StoryCanvas.git
cd StoryCanvas
npm install
# Get a key at https://aistudio.google.com/apikey
# Image generation requires paid billing on the Google AI account.
cp .env.local.example .env.local
# edit .env.local and paste your GEMINI_API_KEY
npm run dev
# open http://localhost:3000Optional environment variables (in .env.local):
GEMINI_API_KEY=AIza...
DAILY_BUDGET_USD=10 # default 5; pauses generation when reachedBrowser
│ POST { prompt, pages, liberty }
▼
/api/story ──► blocklist ──► rate-limit (per-IP)
│
▼
budget check
│
▼
pre-flight classifier
(Gemini 2.5 Flash)
│
▼
main story generation
(Gemini 2.5 Flash, structured
output, Zod-validated)
│ { title, character_bible, pages[] }
▼
Browser renders pages with skeleton image placeholders
│ for each page in parallel:
▼
/api/image
budget check
│
▼
Gemini 2.5 Flash Image
+ watercolor style prefix
+ character bible threaded
into every image prompt
│
▼
Image fades into the page
The character bible is the consistency trick: the LLM produces a short visual description per character once, then every image prompt prepends those descriptions. Imagen will still drift between pages, but characters remain recognizably the same person.
A typical session in three frames:
When all pages are ready, an Export to PDF button appears that opens the browser's print dialog. The print stylesheet renders one story page per printed sheet — image on top, text below — so the resulting PDF is a clean, foldable picture book ready for a refrigerator door.
Children's image generation is a category with real liability. Four layers, cheapest reject first:
- Blocklist — narrow regex catches slurs, explicit sexual terms, CSAM signal words, extreme violence. Rejects before any API call (sub-millisecond).
- Per-IP rate limit — 5 stories per IP per hour, sliding window, in-memory.
- Daily budget kill switch — server-side cost counter (default $5/day). When exceeded, story and image endpoints return a friendly "come back tomorrow" placeholder until UTC midnight.
- Pre-flight classifier — one short Gemini 2.5 Flash call before the main generation. Returns
{safe, category, reason}. Rejects unsafe prompts (~$0.0001, ~1s).
On top of these, the main story call sets Gemini safetySettings to BLOCK_MEDIUM_AND_ABOVE on harm categories (BLOCK_LOW_AND_ABOVE for sexually-explicit), wraps user input in <user_story> tags as a prompt-injection defense, and the system instruction explicitly tells the model to treat the input as data not instructions. Imagen's built-in content policy adds an output-side check; refused images render as a placeholder card without breaking the rest of the story.
Every request gets a line in artifacts/prompts.jsonl — prompt preview, pages, liberty, status, cost, refusal reason if any. Useful forensics if anything ever feels off.
Two real refusals captured during dogfooding:
The tiger prompt is an instructive edge case — a tiger eating a deer is real ecology, and a slightly tamer version would pass. But "drink their blood, eat their flesh for fun, he loved meat" is the framing that tips it from nature into gratuitous, and the classifier catches the framing. This is the kind of decision that's hard to encode in a regex and easy for an LLM-as-classifier to handle with context.
What's not here yet (intentionally — this is a local single-user demo):
- COPPA/FERPA compliance flows
- Output moderation (post-generation scan of the story text)
- Auth, sessions, user accounts
- Persistent storage of stories
- Production observability (just
console.logfor now)
The local demo is a real product hypothesis underneath, framed honestly.
The wedge — Faithful mode aligns with what teachers and many parents actually want: AI as scaffolding for the child's voice, not a replacement for it. Every other AI-for-kids storybook tool I've seen treats the child's input as a seed, not as the story. That's the differentiation.
The artifact — A printed picture book the child made themselves is the marketing channel. Parent shows it on Instagram. Grandparent shows it at the family group chat. The product makes its own distribution.
A staged go-to-market:
| Stage | Audience | Motion | Time horizon |
|---|---|---|---|
| Personal | Me + my daughter | Open source, free | Today |
| Consumer | Parents of 3-10 year olds | $5-7/mo SaaS, organic via printed-book artifacts | 6-12 months |
| Bottom-up classroom | Teachers on Twitter / TikTok / LinkedIn | Free teacher tier + paid classroom plan | 12-24 months |
| Direct schools | Districts, RFPs, conferences | Sales-led, requires founder network or pivot | 2-4 years |
The honest read: K-12 sales is brutal, AI in schools is politically charged in 2026, and image-gen safety liability is existential at scale. Parents-first is the path of least resistance. Schools come after consumer proof, not before.
What would need to be true to make this real:
- Voice input (most preschoolers don't type)
- Persistent character library across stories
- Co-creation mode for parent + child
- Teacher dashboard with student-work visibility
- Curriculum-standards mapping (CCSS-ELA W.K.3, etc.)
- COPPA-compliant under-13 data handling
- Cheaper image provider or self-hosted Stable Diffusion to compress margins at scale
| Feature | Status |
|---|---|
| Story generation, 3 / 6 / 12 pages | ✅ |
| Watercolor illustrations, character-consistent within a story | ✅ |
| Creative Liberty Index (Faithful / Light / Free) | ✅ |
| Print-to-PDF export | ✅ |
| Blocklist + rate limit + budget kill switch + classifier | ✅ |
| Audit log of every request | ✅ |
| Local single-user | ✅ |
| Multi-user / auth / persistence | ❌ |
| Voice input | ❌ |
| Teacher dashboard | ❌ |
| Deployable (Vercel, etc.) | next dev only, deploy untested |
- Next.js 15 + React 19 + TypeScript + Tailwind v4
@google/genaiSDK (Gemini 2.5 Flash for text, Gemini 2.5 Flash Image for images)- Zod for schema validation
- No DB, no auth, no infrastructure
Total dependencies: 5 runtime, 8 dev.
MIT — see LICENSE.







