StoryCanvas

Illustrated storybooks from a child's words — not the AI's.

"The sun came down to the ground. It was sad and tired. Children came and threw pink, blue, yellow powder on it like Holi. The sun smiled and went back to the sky." — story by a 7-year-old, May 2026. The illustration above is what StoryCanvas painted around her words.

What this is

A local web app that turns a story prompt into an illustrated picture book — text from Gemini 2.5 Flash, illustrations from Gemini 2.5 Flash Image, exported as a printable PDF. Built in a day as a working concept piece.

The interesting part is what it does not do. Most AI-for-kids tools take a child's seed and write a whole new story around it. StoryCanvas defaults to the opposite: it treats the child's words as the story, fixes only spelling and grammar, and lets the AI handle pagination and illustration. The child keeps authorship. The AI is a visual co-creator, not a ghostwriter.

That's the wedge.

The Creative Liberty Index

A three-position dial on every story:

Level	Name	What the AI does
0	Faithful (default)	Preserves the child's exact wording. Fixes only spelling and grammar. Splits text into pages at natural breaks. Generates illustrations from what is literally in the text. Adds no new plot, characters, dialogue, or themes.
1	Light	Keeps the child's wording, plot, and characters. May add brief connective sentences for pacing or make implicit visual details explicit. No new plot events.
2	Free	Uses the prompt as a creative seed. Invents characters, dialogue, scenes — a fully-developed AI story. (This is how most competing tools work.)

Side-by-side: same prompt, different liberty

Prompt (a real story by a 7-year-old): "The sun came down to the ground. It was sad and tired. Children came and threw pink, blue, yellow powder on it like Holi. The sun smiled and went back to the sky."

Liberty 0 · Faithful
Her words. Untouched.

A 3-page storybook titled 'The Sun's Colors' with the child's original four sentences split across three watercolor illustrations

Liberty 2 · Free
The AI invents characters and expands.

A 3-page storybook titled 'The Day the Sun Needed a Splash of Color' featuring invented characters Lily and Tom in much longer prose

Title on the left: The Sun's Colors — a title the LLM generated for her story, but every sentence on the page is hers. Title on the right: The Day the Sun Needed a Splash of Color — a different story entirely, with named characters (Lily, Tom) the AI made up.

For a child, a teacher, or a parent who wants to preserve a kid's voice, Faithful is the right default. Free is great for "I have a one-line idea, fill it in." Light sits between.

Quick start

git clone https://github.com/agaonker/StoryCanvas.git
cd StoryCanvas
npm install

# Get a key at https://aistudio.google.com/apikey
# Image generation requires paid billing on the Google AI account.
cp .env.local.example .env.local
# edit .env.local and paste your GEMINI_API_KEY

npm run dev
# open http://localhost:3000

Optional environment variables (in .env.local):

GEMINI_API_KEY=AIza...
DAILY_BUDGET_USD=10     # default 5; pauses generation when reached

How it works

Browser
  │ POST { prompt, pages, liberty }
  ▼
/api/story  ──►  blocklist  ──►  rate-limit (per-IP)
                                      │
                                      ▼
                                 budget check
                                      │
                                      ▼
                                pre-flight classifier
                                (Gemini 2.5 Flash)
                                      │
                                      ▼
                              main story generation
                              (Gemini 2.5 Flash, structured
                              output, Zod-validated)
                                      │ { title, character_bible, pages[] }
                                      ▼
Browser renders pages with skeleton image placeholders
                                      │ for each page in parallel:
                                      ▼
                                /api/image
                                budget check
                                      │
                                      ▼
                              Gemini 2.5 Flash Image
                              + watercolor style prefix
                              + character bible threaded
                              into every image prompt
                                      │
                                      ▼
                              Image fades into the page

The character bible is the consistency trick: the LLM produces a short visual description per character once, then every image prompt prepends those descriptions. Imagen will still drift between pages, but characters remain recognizably the same person.

Walkthrough

A typical session in three frames:


	1. Start. Pick how many pages and how much liberty you want the AI to take. Default is Faithful.
	2. Write. Paste a prompt or a child's draft. The Create button enables at 10+ characters.
	3. Generate. ~5 seconds for the story text, then images stream in over ~7s each (in parallel).

When all pages are ready, an Export to PDF button appears that opens the browser's print dialog. The print stylesheet renders one story page per printed sheet — image on top, text below — so the resulting PDF is a clean, foldable picture book ready for a refrigerator door.

Safety guardrails

Children's image generation is a category with real liability. Four layers, cheapest reject first:

Blocklist — narrow regex catches slurs, explicit sexual terms, CSAM signal words, extreme violence. Rejects before any API call (sub-millisecond).
Per-IP rate limit — 5 stories per IP per hour, sliding window, in-memory.
Daily budget kill switch — server-side cost counter (default $5/day). When exceeded, story and image endpoints return a friendly "come back tomorrow" placeholder until UTC midnight.
Pre-flight classifier — one short Gemini 2.5 Flash call before the main generation. Returns {safe, category, reason}. Rejects unsafe prompts (~$0.0001, ~1s).

On top of these, the main story call sets Gemini safetySettings to BLOCK_MEDIUM_AND_ABOVE on harm categories (BLOCK_LOW_AND_ABOVE for sexually-explicit), wraps user input in <user_story> tags as a prompt-injection defense, and the system instruction explicitly tells the model to treat the input as data not instructions. Imagen's built-in content policy adds an output-side check; refused images render as a placeholder card without breaking the rest of the story.

Every request gets a line in artifacts/prompts.jsonl — prompt preview, pages, liberty, status, cost, refusal reason if any. Useful forensics if anything ever feels off.

Guardrails in action

Two real refusals captured during dogfooding:

Classifier refusal
"a tiger who kills deer for fun, eats flesh, drinks blood"
Gemini Flash classifier returns safe: false, category: "violence". Cost: ~$0.0001. Rejected in ~1.5s before any story generation runs. Rate limit
Same address, 6th attempt within an hour
In-memory sliding window per IP. Returns 429 with a friendly "try again in N minutes" message and a Retry-After header.

The tiger prompt is an instructive edge case — a tiger eating a deer is real ecology, and a slightly tamer version would pass. But "drink their blood, eat their flesh for fun, he loved meat" is the framing that tips it from nature into gratuitous, and the classifier catches the framing. This is the kind of decision that's hard to encode in a regex and easy for an LLM-as-classifier to handle with context.

What's not here yet (intentionally — this is a local single-user demo):

COPPA/FERPA compliance flows
Output moderation (post-generation scan of the story text)
Auth, sessions, user accounts
Persistent storage of stories
Production observability (just console.log for now)

What this could be

The local demo is a real product hypothesis underneath, framed honestly.

The wedge — Faithful mode aligns with what teachers and many parents actually want: AI as scaffolding for the child's voice, not a replacement for it. Every other AI-for-kids storybook tool I've seen treats the child's input as a seed, not as the story. That's the differentiation.

The artifact — A printed picture book the child made themselves is the marketing channel. Parent shows it on Instagram. Grandparent shows it at the family group chat. The product makes its own distribution.

A staged go-to-market:

Stage	Audience	Motion	Time horizon
Personal	Me + my daughter	Open source, free	Today
Consumer	Parents of 3-10 year olds	$5-7/mo SaaS, organic via printed-book artifacts	6-12 months
Bottom-up classroom	Teachers on Twitter / TikTok / LinkedIn	Free teacher tier + paid classroom plan	12-24 months
Direct schools	Districts, RFPs, conferences	Sales-led, requires founder network or pivot	2-4 years

The honest read: K-12 sales is brutal, AI in schools is politically charged in 2026, and image-gen safety liability is existential at scale. Parents-first is the path of least resistance. Schools come after consumer proof, not before.

What would need to be true to make this real:

Voice input (most preschoolers don't type)
Persistent character library across stories
Co-creation mode for parent + child
Teacher dashboard with student-work visibility
Curriculum-standards mapping (CCSS-ELA W.K.3, etc.)
COPPA-compliant under-13 data handling
Cheaper image provider or self-hosted Stable Diffusion to compress margins at scale

What's here today

Feature	Status
Story generation, 3 / 6 / 12 pages	✅
Watercolor illustrations, character-consistent within a story	✅
Creative Liberty Index (Faithful / Light / Free)	✅
Print-to-PDF export	✅
Blocklist + rate limit + budget kill switch + classifier	✅
Audit log of every request	✅
Local single-user	✅
Multi-user / auth / persistence	❌
Voice input	❌
Teacher dashboard	❌
Deployable (Vercel, etc.)	⚠️ runs in `next dev` only, deploy untested

Stack

Next.js 15 + React 19 + TypeScript + Tailwind v4
@google/genai SDK (Gemini 2.5 Flash for text, Gemini 2.5 Flash Image for images)
Zod for schema validation
No DB, no auth, no infrastructure

Total dependencies: 5 runtime, 8 dev.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
docs		docs
lib		lib
scripts		scripts
.env.local.example		.env.local.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StoryCanvas

What this is

The Creative Liberty Index

Side-by-side: same prompt, different liberty

Quick start

How it works

Walkthrough

Safety guardrails

Guardrails in action

What this could be

What's here today

Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StoryCanvas

What this is

The Creative Liberty Index

Side-by-side: same prompt, different liberty

Quick start

How it works

Walkthrough

Safety guardrails

Guardrails in action

What this could be

What's here today

Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages