Skip to content

Releases: leonshen/VoiceTransor

VoiceTransor v0.9.0 - Beta Release

03 Nov 02:41

Choose a tag to compare

Pre-release

VoiceTransor v0.9.0 - Beta Release

AI-Powered Speech-to-Text with Local Processing

VoiceTransor converts audio to text using OpenAI's Whisper model, with optional AI text processing powered by Ollama. Everything runs locally - your data never leaves your computer.

🎯 What's New in v0.9.0

Major Features

  • Single universal installer - works on both GPU and CPU systems
  • Automatic GPU detection - CUDA acceleration when available, graceful CPU fallback
  • 98% performance improvement - audio loading now <100ms (was 5+ seconds)
  • Production-ready packaging - professional installer with proper dependencies
  • Comprehensive documentation - English and Chinese user guides

Performance Improvements

  • Critical fix: Reduced audio file loading time from 5+ seconds to <100ms
  • Optimized ffprobe execution for frozen builds
  • Cached path lookups to avoid slow operations

GPU Support

  • ✅ NVIDIA GPUs (GTX 900+, RTX series) - CUDA 12.1 acceleration
  • ✅ Apple Silicon (M1/M2/M3) - Metal Performance Shaders
  • ✅ Automatic fallback to CPU if no GPU available
  • ℹ️ No separate CPU/GPU versions needed!

📥 Download & Installation

System Requirements

Minimum:

  • Windows 10 (64-bit)
  • 8GB RAM
  • 5GB free disk space

Recommended (GPU Acceleration):

  • NVIDIA GPU (8GB+ VRAM)
  • Driver version >= 525.60

Installation Steps

  1. Download VoiceTransor-v0.9.0-Windows-x64-Setup.exe (below)
  2. Run the installer and follow the setup wizard
  3. Install FFmpeg (required for audio processing):

Optional: Install Ollama for AI Features

Ollama enables AI text processing (summarize, translate, etc.):

  1. Download from: https://ollama.com/download
  2. Run ollama serve
  3. Pull a model: ollama pull llama3.1:8b

🚀 Quick Start

  1. Launch VoiceTransor
  2. Click Import Audio (supports WAV, MP3, M4A, FLAC, etc.)
  3. Click Transcribe to Text
  4. Choose settings:
    • Model: base (recommended)
    • Device: auto (uses GPU if available)
    • Language: Auto-detect
  5. Export as TXT or process with AI

📖 Documentation

🐛 Known Issues

  • First transcription downloads Whisper model (~140MB)
  • Windows Defender may show warning (installer is not signed)

⚡ Performance

Transcription Speed (1 hour audio):

  • CPU (8-core): ~30-60 min
  • NVIDIA RTX 3060: ~2-5 min
  • Apple M1 Pro: ~3-6 min

📝 Changelog

Added

  • Complete distribution packaging system
  • Automated build scripts with one-command builds
  • Inno Setup installer for professional Windows installation
  • Single universal build strategy (no separate CPU/GPU versions)
  • Comprehensive bilingual documentation (English/Chinese)
  • FFprobe path caching for better performance

Changed

  • Version updated from 0.3.0 to 0.9.0 (Beta release)
  • Improved error logging for whisper import failures

Fixed

  • Critical performance fix: Audio file loading time reduced by 98.4%
    • Issue: subprocess.run() was extremely slow in PyInstaller frozen environment
    • Solution: Use shell=True in frozen environment to bypass Windows security overhead
    • Performance improvement: 5000ms → 80ms (60x speedup)
  • Fixed whisper import errors by including numba/llvmlite dependencies
  • Improved GPU memory cleanup logging

See CHANGELOG.md for full details.

📧 Support

📜 License

MIT License - see LICENSE


Made with ❤️ using OpenAI Whisper and Ollama