All notable changes to Voicebox will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.1.0 - 2026-01-25
- Voice Cloning - Clone voices from audio samples using Qwen3-TTS (1.7B and 0.6B models)
- Voice Profile Management - Create, edit, and organize voice profiles with multiple samples
- Speech Generation - Generate high-quality speech from text using cloned voices
- Generation History - Track all generations with search and filtering capabilities
- Audio Transcription - Automatic transcription powered by Whisper
- In-App Recording - Record audio samples directly in the app with waveform visualization
- Tauri Desktop App - Native desktop application for macOS, Windows, and Linux
- Local Server Mode - Embedded Python server runs automatically
- Remote Server Mode - Connect to a remote Voicebox server on your network
- Auto-Updates - Automatic update notifications and installation
- REST API - Full REST API for voice synthesis and profile management
- OpenAPI Documentation - Interactive API docs at
/docsendpoint - Type-Safe Client - Auto-generated TypeScript client from OpenAPI schema
- Voice Prompt Caching - Fast regeneration with cached voice prompts
- Multi-Sample Support - Combine multiple audio samples for better voice quality
- GPU/CPU/MPS Support - Automatic device detection and optimization
- Model Management - Lazy loading and VRAM management
- SQLite Database - Local data persistence
- Built with Tauri v2 (Rust + React)
- FastAPI backend with async Python
- TypeScript frontend with React Query and Zustand
- Qwen3-TTS for voice cloning
- Whisper for transcription
- macOS (Apple Silicon and Intel)
- Windows
- Linux (AppImage)
- Audio export failing when Tauri save dialog returns object instead of string path
- Makefile - Comprehensive development workflow automation with commands for setup, development, building, testing, and code quality checks
- Includes Python version detection and compatibility warnings
- Self-documenting help system with
make help - Colored output for better readability
- Supports parallel development server execution
- README - Added Makefile reference and updated Quick Start with Makefile-based setup instructions alongside manual setup
- Real-time streaming synthesis
- Conversation mode with multiple speakers
- Voice effects (pitch shift, reverb, M3GAN-style)
- Timeline-based audio editor
- Additional voice models (XTTS, Bark)
- Voice design from text descriptions
- Project system for saving sessions
- Plugin architecture