WordWave 🌊

Gesture, Sound and AI: A Multimodal Approach to Vocabulary Learning

What if children learned new words by moving, speaking, and playing with an AI companion in the room, rather than filling in worksheets or repeating the same drills?

At a time when AI in education often means more screens and more content, I kept asking myself how I would have wanted to learn as a child. What if learning listened, moved, and played along with you, turning curiosity into sound and motion?

That question became the starting point for WordWave, a multimodal learning system that blends computer vision, real time melody, and an AI agent to turn gestures and speech into playful vocabulary exploration. When a child selects a word, its synonyms and opposites appear as cloud shaped forms drifting across the sky. With a point, a pinch, or a spoken prompt, children bring these words to life, triggering music, sound cues, and visual responses in real time.

An AI agent is woven into the experience as a conversational companion. By saying "Hey Lulu", children can ask what a word means or hear it used in context. Instead of red marks on worksheets, WordWave offers an immersive, low pressure environment where language is explored through movement, sound, and conversation. Learning becomes something children inhabit with their whole body, not something they passively consume on a screen.

📸 Demo

With camera mode on, children use pinch gestures to grab words floating in the sky

Once a word is selected, an OK gesture cues the system to start singing it

Remark: GIF shown without sound. Full experience available in WordWave.

WordWave uses a real time multimodal architecture with a browser based frontend and a lightweight backend deployed on Vercel. Three.js renders the 3D scene and maps hand and gesture inputs directly to interactive word states. Magenta generates structured musical feedback in response to user actions. Speech interaction is handled by an ElevenLabs voice agent, activated via the wake phrase "Hey Lulu", enabling spoken queries and vocal output.

While WordWave is an early prototype, it points toward broader possibilities for embodied learning beyond vocabulary. The same interaction patterns could be extended to concepts like numbers, emotions, or storytelling. Future iterations may explore adaptive difficulty, collaborative play, and integration into classroom settings.

🛠️ Tech Stack

Frontend: React, Vite, Three.js (React Three Fiber), Framer Motion
AI/Voice: ElevenLabs Conversational AI, Google Gemini 2.0
Audio: Tone.js, Magenta.js, Web Audio API
Backend: Convex (Serverless Functions)

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
convex		convex
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
1.gif		1.gif
2.gif		2.gif
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WordWave 🌊

📸 Demo

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WordWave 🌊

📸 Demo

🛠️ Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages