Skip to content

Latest commit

 

History

History
85 lines (63 loc) · 2.98 KB

File metadata and controls

85 lines (63 loc) · 2.98 KB

Changelog

All notable changes to Voicebox will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.1.0 - 2026-01-25

Added

Core Features

  • Voice Cloning - Clone voices from audio samples using Qwen3-TTS (1.7B and 0.6B models)
  • Voice Profile Management - Create, edit, and organize voice profiles with multiple samples
  • Speech Generation - Generate high-quality speech from text using cloned voices
  • Generation History - Track all generations with search and filtering capabilities
  • Audio Transcription - Automatic transcription powered by Whisper
  • In-App Recording - Record audio samples directly in the app with waveform visualization

Desktop App

  • Tauri Desktop App - Native desktop application for macOS, Windows, and Linux
  • Local Server Mode - Embedded Python server runs automatically
  • Remote Server Mode - Connect to a remote Voicebox server on your network
  • Auto-Updates - Automatic update notifications and installation

API

  • REST API - Full REST API for voice synthesis and profile management
  • OpenAPI Documentation - Interactive API docs at /docs endpoint
  • Type-Safe Client - Auto-generated TypeScript client from OpenAPI schema

Technical

  • Voice Prompt Caching - Fast regeneration with cached voice prompts
  • Multi-Sample Support - Combine multiple audio samples for better voice quality
  • GPU/CPU/MPS Support - Automatic device detection and optimization
  • Model Management - Lazy loading and VRAM management
  • SQLite Database - Local data persistence

Technical Details

  • Built with Tauri v2 (Rust + React)
  • FastAPI backend with async Python
  • TypeScript frontend with React Query and Zustand
  • Qwen3-TTS for voice cloning
  • Whisper for transcription

Platform Support

  • macOS (Apple Silicon and Intel)
  • Windows
  • Linux (AppImage)

[Unreleased]

Fixed

  • Audio export failing when Tauri save dialog returns object instead of string path

Added

  • Makefile - Comprehensive development workflow automation with commands for setup, development, building, testing, and code quality checks
    • Includes Python version detection and compatibility warnings
    • Self-documenting help system with make help
    • Colored output for better readability
    • Supports parallel development server execution

Changed

  • README - Added Makefile reference and updated Quick Start with Makefile-based setup instructions alongside manual setup

[Unreleased - Planned]

Planned

  • Real-time streaming synthesis
  • Conversation mode with multiple speakers
  • Voice effects (pitch shift, reverb, M3GAN-style)
  • Timeline-based audio editor
  • Additional voice models (XTTS, Bark)
  • Voice design from text descriptions
  • Project system for saving sessions
  • Plugin architecture