VoiceForge

VoiceForge is a browser-based assistive video tool that lets a user type during calls and output cloned speech with a lip-synced face preview.

📑 Table of Contents

Why This Exists
Tech Stack
Browser Compatibility
Prerequisites
Setup
Environment Variables
Using VoiceForge In A Call
OBS Virtual Camera Setup
API
Roadmap
License
About

Why This Exists

Deaf and speech-impaired people on video calls are often pushed into chat boxes, delayed interpretation, or awkward turn-taking. VoiceForge explores a local-first interface where typed intent can become spoken audio and a synchronized visual feed, helping the user participate in the same conversational channel as everyone else.

Tech Stack

Browser Compatibility

VoiceForge targets Chrome and Edge only. WebRTC Insertable Streams and canvas capture APIs are still uneven across browsers, so Firefox and Safari are not supported for the virtual camera MVP.

Prerequisites

VoiceForge's voice cloning engine is 100% free — no paid API plan, no account sign-up, and no API key required.

It is powered by ResembleAI/Chatterbox-Multilingual-TTS, a production-grade multilingual voice cloning model hosted as a public Hugging Face Space. The server connects to it using the official @gradio/client bridge package, which is installed automatically with npm install.

What you need:

Node.js 18 or newer
npm 9 or newer
Chrome or Edge (for the virtual camera feature)
An internet connection when running in live mode (see Environment Variables for offline mock mode)

Setup

Install Node.js 18 or newer.
From the repository root, install all dependencies (this includes @gradio/client):

npm install

Copy the example environment file:

cp .env.example .env

(Optional) Open .env and review the settings. The defaults run in offline mock mode, so no API key or internet access is needed. See Environment Variables for the full reference.
Start the client and server together:

npm run dev

Open http://localhost:5173 in Chrome or Edge.

Environment Variables

All variables live in your local .env file (copy from .env.example). None of them require a paid account or API key.

Variable	Default	Description
`VOICE_ENGINE_SPACE`	(commented out)	The Hugging Face Gradio space used for voice synthesis. See the dual-mode setup below.
`MOCK_CHATTERBOX`	`true`	Controls whether the live AI or an offline test stub is used. See below.
`PORT`	`3001`	Express API port.
`CLIENT_URL`	`http://localhost:5173`	Allowed CORS origin for the Vite dev server.
`STREAM_SECRET`	(auto-generated)	AES-256-GCM signing key for speech stream tokens. Set a fixed value to survive server restarts.

Dual-Mode Voice Engine Setup

VoiceForge ships with two engine routing modes that you control entirely from .env:

Offline mock mode - local default

The checked-in .env.example uses mock mode by default:

MOCK_CHATTERBOX=true

This skips all Hugging Face network calls. The server returns a fixture voice_id instantly on clone and streams a short silent audio file on speak. This is ideal for contributors working on UI changes, automated CI pipelines, or offline environments.

Safety: MOCK_CHATTERBOX=true has no effect when NODE_ENV=production. The server logs a yellow warning at startup whenever mock mode is active so it can never be silently enabled.

Live AI mode - official production engine

Leave VOICE_ENGINE_SPACE commented out with a #. The server will automatically route all synthesis requests to the official, lightning-fast production space:

# VOICE_ENGINE_SPACE=ResembleAI/Chatterbox-Multilingual-TTS
MOCK_CHATTERBOX=false

This is the recommended setting for end-users and deployed environments.

Live AI mode - independent backup mirror

If the official space is temporarily busy or you prefer to route through an independent mirror, uncomment the line and point it at the community-maintained backup:

VOICE_ENGINE_SPACE=itzzavdheshh/voiceforge-engine
MOCK_CHATTERBOX=false

This mirror runs the same Chatterbox Multilingual model. Useful when the primary space is under heavy load or during extended development sessions.

Using VoiceForge In A Call

Open VoiceForge in Chrome or Edge.
Record a 10-second consent-based reference clip.
Clone the voice and continue to the Call page.
Allow webcam access.
Type a phrase and press Enter or Speak.
Turn on Go Live to expose the canvas stream inside the browser.
In Zoom, Google Meet, or Microsoft Teams, open camera settings and select the virtual camera source you have configured.

OBS Virtual Camera Setup

Most video call apps cannot directly select a browser tab as a system camera. For the MVP, install OBS Studio and use OBS Virtual Camera as the bridge.

Install OBS Studio.
Add a Browser Source pointing to http://localhost:5173. Set the width to 1920 and height to 1080 to capture the full interface.
Crop the source to focus on the lip-synced output preview.
Click Start Virtual Camera in the OBS Controls panel.
Select OBS Virtual Camera as your camera in your preferred video call application.

Video Call App Configuration

Zoom: Go to Settings > Video > Camera and select OBS Virtual Camera.

Google Meet: Go to Settings > Video > Camera and select OBS Virtual Camera.

Microsoft Teams: Go to Settings > Devices > Camera and select OBS Virtual Camera.

For detailed setup guides (including Discord and Webex) and troubleshooting tips, see our Virtual Camera Guide.

API

Method	Endpoint	Description
`POST`	`/api/voice/clone`	Upload reference audio. Stores it server-side and returns a `voice_id`. No external API call in mock mode.
`POST`	`/api/voice/speak`	Send text, `voice_id`, and optional voice settings. Returns a signed `speechId` and streaming `audioUrl`.
`GET`	`/api/voice/speak/stream?t=<speechId>`	Stream the Chatterbox-generated audio for a pending signed speech token (`t`). Proxied from the Hugging Face Space.
`GET`	`/api/voice/status`	Returns current engine mode (`isMock`, `space`) for debugging.
`GET`	`/api/health`	Returns local API health status.

Roadmap

Done: Store cloned voice profiles and reference audio Blobs in IndexedDB via client/src/utils/db.js.
Done: Stream TTS audio through POST /api/voice/speak and GET /api/voice/speak/stream.
Done: Replaced ElevenLabs with the free ResembleAI Chatterbox Multilingual TTS engine via @gradio/client.
In progress: Voice tuning controls are wired through persisted voice_settings; multilingual output supports 23 languages via Chatterbox, with dedicated language controls in the UI.
In progress: The MVP virtual camera uses canvas capture; full WebRTC Insertable Streams frame replacement remains future work.
TODO: Replace the placeholder models/wav2lip.onnx with a real lightweight browser Wav2Lip ONNX model.
TODO: Implement real ONNX Runtime Web Wav2Lip inference.
Done: Replace the fallback mouth animation with model-driven mouth movement.
Done: Add richer virtual camera documentation for OBS and each call provider.
TODO: Add automated browser tests for camera and microphone permission flows.
TODO: Persist voice profiles across server restarts (database or object-store backend).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 404 Commits
.github		.github
client		client
docs		docs
models		models
server		server
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
largefile.bin		largefile.bin
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js
pr_body.md		pr_body.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceForge

📑 Table of Contents

Why This Exists

Tech Stack

Browser Compatibility

Prerequisites

Setup

Environment Variables

Dual-Mode Voice Engine Setup

Offline mock mode - local default

Live AI mode - official production engine

Live AI mode - independent backup mirror

Using VoiceForge In A Call

OBS Virtual Camera Setup

Video Call App Configuration

API

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoiceForge

📑 Table of Contents

Why This Exists

Tech Stack

Browser Compatibility

Prerequisites

Setup

Environment Variables

Dual-Mode Voice Engine Setup

Offline mock mode - local default

Live AI mode - official production engine

Live AI mode - independent backup mirror

Using VoiceForge In A Call

OBS Virtual Camera Setup

Video Call App Configuration

API

Roadmap

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages