Releases: leonshen/VoiceTransor
Releases · leonshen/VoiceTransor
VoiceTransor v0.9.0 - Beta Release
VoiceTransor v0.9.0 - Beta Release
AI-Powered Speech-to-Text with Local Processing
VoiceTransor converts audio to text using OpenAI's Whisper model, with optional AI text processing powered by Ollama. Everything runs locally - your data never leaves your computer.
🎯 What's New in v0.9.0
Major Features
- ✅ Single universal installer - works on both GPU and CPU systems
- ✅ Automatic GPU detection - CUDA acceleration when available, graceful CPU fallback
- ✅ 98% performance improvement - audio loading now <100ms (was 5+ seconds)
- ✅ Production-ready packaging - professional installer with proper dependencies
- ✅ Comprehensive documentation - English and Chinese user guides
Performance Improvements
- Critical fix: Reduced audio file loading time from 5+ seconds to <100ms
- Optimized ffprobe execution for frozen builds
- Cached path lookups to avoid slow operations
GPU Support
- ✅ NVIDIA GPUs (GTX 900+, RTX series) - CUDA 12.1 acceleration
- ✅ Apple Silicon (M1/M2/M3) - Metal Performance Shaders
- ✅ Automatic fallback to CPU if no GPU available
- ℹ️ No separate CPU/GPU versions needed!
📥 Download & Installation
System Requirements
Minimum:
- Windows 10 (64-bit)
- 8GB RAM
- 5GB free disk space
Recommended (GPU Acceleration):
- NVIDIA GPU (8GB+ VRAM)
- Driver version >= 525.60
Installation Steps
- Download
VoiceTransor-v0.9.0-Windows-x64-Setup.exe(below) - Run the installer and follow the setup wizard
- Install FFmpeg (required for audio processing):
- Download: https://www.gyan.dev/ffmpeg/builds/
- Choose "ffmpeg-release-essentials.zip"
- Extract and add to PATH (How?)
Optional: Install Ollama for AI Features
Ollama enables AI text processing (summarize, translate, etc.):
- Download from: https://ollama.com/download
- Run
ollama serve - Pull a model:
ollama pull llama3.1:8b
🚀 Quick Start
- Launch VoiceTransor
- Click Import Audio (supports WAV, MP3, M4A, FLAC, etc.)
- Click Transcribe to Text
- Choose settings:
- Model:
base(recommended) - Device:
auto(uses GPU if available) - Language: Auto-detect
- Model:
- Export as TXT or process with AI
📖 Documentation
🐛 Known Issues
- First transcription downloads Whisper model (~140MB)
- Windows Defender may show warning (installer is not signed)
⚡ Performance
Transcription Speed (1 hour audio):
- CPU (8-core): ~30-60 min
- NVIDIA RTX 3060: ~2-5 min
- Apple M1 Pro: ~3-6 min
📝 Changelog
Added
- Complete distribution packaging system
- Automated build scripts with one-command builds
- Inno Setup installer for professional Windows installation
- Single universal build strategy (no separate CPU/GPU versions)
- Comprehensive bilingual documentation (English/Chinese)
- FFprobe path caching for better performance
Changed
- Version updated from 0.3.0 to 0.9.0 (Beta release)
- Improved error logging for whisper import failures
Fixed
- Critical performance fix: Audio file loading time reduced by 98.4%
- Issue: subprocess.run() was extremely slow in PyInstaller frozen environment
- Solution: Use shell=True in frozen environment to bypass Windows security overhead
- Performance improvement: 5000ms → 80ms (60x speedup)
- Fixed whisper import errors by including numba/llvmlite dependencies
- Improved GPU memory cleanup logging
See CHANGELOG.md for full details.
📧 Support
📜 License
MIT License - see LICENSE
Made with ❤️ using OpenAI Whisper and Ollama