A comprehensive voice-controlled automation platform that transforms your computer into an intelligent voice assistant. Control applications, execute complex workflows, and manage your system entirely through voice commands.
- Grammar-Based Voice Recognition - Advanced Vosk speech recognition with command-specific grammar for superior accuracy
- Neural Text-to-Speech - Powered by Piper TTS for incredibly natural, human-like voice synthesis
- Real Keyboard Simulation - Uses Linux uinput for kernel-level key events that work in any application
- Browser Automation - Control web browsers programmatically with Chromium
- Extensible Plugin System - Add custom commands, workflows, and integrations
- Modern Full-Stack Web - Integrated Leptos CSR frontend with Axum backend for reactive UI
- Remote Access - Tailscale integration for global access
- Real-time Processing - Low-latency voice command execution
- Privacy-First - All processing local, no cloud dependencies
- Script Execution - Secure multi-language script execution engine
- Quick Start
- System Requirements
- Installation
- Configuration
- Usage
- Development
- Troubleshooting
- Architecture
- Contributing
- License
# Clone the repository
git clone https://github.com/rendivs925/vibespeak.git
cd vibespeak
# Run setup (installs dependencies and creates config)
make setup
# Start full-stack development server (recommended)
make dev-fullstack
# Or choose interactively
make devOpen http://localhost:8080 in your browser to configure and use Vibespeak.
For detailed setup instructions, see SETUP.md.
- OS: Linux (Arch, Ubuntu, Fedora), macOS 10.15+, Windows 10+
- RAM: 2GB
- Disk: 500MB free space
- Microphone: Any standard audio input device
- OS: Linux (Arch/Ubuntu)
- RAM: 4GB+
- Disk: 2GB free space (including voice models)
- CPU: Multi-core processor with AVX support
- Microphone: High-quality USB microphone
Arch Linux:
sudo pacman -S vosk-api alsa-utils cmake fmt spdlog onnxruntime-cpu espeak-ngUbuntu/Debian:
sudo apt install libvosk-dev alsa-utils cmake libfmt-dev libspdlog-dev onnxruntime libespeak-ng-devmacOS (using Homebrew):
brew install vosk cmake fmt spdlog espeak-ng
# ONNX Runtime needs to be installed manuallyWindows:
- Install MSVC build tools
- Download Vosk from: https://alphacephei.com/vosk/models
- ONNX Runtime: Download from https://github.com/microsoft/onnxruntime/releases
- Piper will be built from source as part of the setup
Note: Vibespeak requires Piper TTS exclusively for natural voice synthesis. The system builds Piper from source for optimal compatibility.
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
# Verify installation
rustc --version # Should be 1.70+
cargo --version # Should be 1.70+# Node.js for web interface development
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs
# Or using nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install 18
nvm use 18# Clone repository
git clone https://github.com/rendivs925/vibespeak.git
cd vibespeak
# Run automated setup
make setupThis will:
- Check and install system dependencies
- Build Piper TTS from source
- Download high-quality voice models (en_US-amy-medium)
- Generate default configuration
- Set up the web interface
Note: For detailed setup instructions and troubleshooting, see SETUP.md.
# Clone repository
git clone https://github.com/yourusername/vibespeak.git
cd vibespeak
# Install Rust dependencies
cargo build
# Install web dependencies (optional)
make web-deps
# Generate configuration
make config# Build Docker image
make docker
# Run in container
make docker-runVibespeak excludes large binary files and temporary data from version control:
- Voice Models:
models/- TTS and speech recognition models (60MB+ each) - Piper TTS:
piper/- Built TTS engine and dependencies - Temporary Audio:
*.wav,*.mp3, etc. - Generated audio files - Logs:
*.log,logs/- Application logs - Build Artifacts:
target/,Cargo.lock- Rust build outputs - Node Modules:
node_modules/- Web dependencies
- Repository Size: Keeps the git repository small and fast
- Security: Prevents accidental commit of sensitive data
- Performance: Faster cloning and CI/CD operations
- Privacy: Generated audio files stay local
After cloning the repository, run make setup to download and build all required components.
Vibespeak requires both Vosk language models for speech recognition and Piper voice models for natural speech synthesis:
# Create models directory
mkdir -p model
# Download English model (balanced size and accuracy)
cd model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip
unzip vosk-model-en-us-0.22-lgraph.zip
# Files will be extracted to vosk-model-en-us-0.22-lgraph/The automated setup builds Piper TTS from source and downloads the recommended voice model. For manual setup:
# Build Piper TTS from source (requires cmake, fmt, spdlog, onnxruntime, espeak-ng)
git clone https://github.com/rhasspy/piper.git
cd piper
mkdir build && cd build
cmake ..
make -j$(nproc)
# Install Piper locally in the project
cp piper ../../piper/
cp -r pi/lib/* ../../piper/lib/
cp -r pi/share/* ../../piper/share/
# Download the recommended voice model (en_US-amy-medium)
cd ../../models
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.jsonNote: Vibespeak currently uses only the en_US-amy-medium voice model for optimal natural speech quality. Additional voice models can be added manually if needed.
| Model | Size | Accuracy | Use Case |
|---|---|---|---|
vosk-model-small-en-us-0.15 |
40MB | Good | Development, resource-constrained |
vosk-model-en-us-0.22-lgraph |
120MB | Very Good | Production, balanced |
vosk-model-en-us-0.22 |
1.8GB | Excellent | Production, high accuracy |
Vibespeak uses only the highest-quality voice model:
| Voice Model | Quality | Size | Description |
|---|---|---|---|
en_US-amy-medium |
⭐⭐⭐⭐⭐ | 60MB | Primary voice - Natural female voice, optimized for clarity and expressiveness |
Note: Vibespeak builds Piper TTS from source and uses only the en_US-amy-medium model for consistent, high-quality natural speech synthesis. This neural network-based voice provides significantly more natural speech than traditional TTS engines.
The main configuration file is config/system.json:
{
"commands": [],
"workflows": [],
"scripts": [],
"settings": {
"vosk_model_path": "model/vosk-model-en-us-0.22-lgraph",
"sample_rate": 16000,
"audio_device": null,
"web_server_port": 8080,
"enable_tts": true,
"enable_webrtc": false,
"security_level": "trusted",
"tailscale_enabled": false
}
}Vibespeak now uses Piper TTS with multiple high-quality voice options:
{
"settings": {
"tts_engine": "piper",
"tts_voice": "natural",
"tts_pitch": 1.0,
"tts_rate": 0.95,
"tts_volume": 0.8
}
}Available Voices:
natural- High-quality female voice (recommended)male- Natural male voicefemale- Alternative female voicefast- Faster speech rateslow- Slower, clearer speech
- Start Vibespeak:
make dev - Open http://localhost:8080
- Configure voice commands, workflows, and scripts through the web interface
{
"settings": {
"sample_rate": 44100,
"audio_device": "hw:1,0",
"noise_reduction": true,
"echo_cancellation": true
}
}{
"settings": {
"security_level": "trusted",
"allowed_paths": ["/home/user", "/tmp"],
"blocked_commands": ["rm -rf", "sudo"]
}
}{
"settings": {
"web_server_port": 8080,
"tailscale_enabled": true,
"tailscale_interface": "tailscale0",
"cors_origins": ["http://localhost:8080"]
}
}# Development mode
make dev
# Production mode
make run
# Background service
make build
./target/release/vibespeak &Vibespeak comes with an extensive set of pre-configured voice commands for common tasks:
- "open browser" - Opens default web browser
- "open terminal" - Opens new terminal window
- "show menu" - Opens application menu
- "lock screen" - Locks the screen
- "desk one/two/three/four" - Switch workspaces
- "focus left/right/up/down" - Move focus between windows
- "close window" - Close active window
- "full screen" - Toggle fullscreen mode
- "split pane" - Create vertical split
- "split horizontal" - Create horizontal split
- "pane one/two/three..." - Switch between panes
- "new window" - Create new tmux window
- "start dev" - Start development server
- "check code" - Run code linting/checking
- "save file" - Save current file
- "open editor" - Open text editor
- "list files" - Show directory contents
- "open files" - Open file manager
- "clear screen" - Clear terminal
- "type" - Enter voice typing mode (dictation)
- Win+T - Toggle typing mode
- Esc - Exit typing mode
All commands support fuzzy matching for natural speech recognition.
- Open http://localhost:8080
- Go to "Voice Commands" tab
- Click "Add Command"
- Enter voice phrase and corresponding action
- Test recognition and save
Create multi-step automation sequences:
{
"name": "Code Review",
"trigger": "start code review",
"steps": [
{
"type": "execute",
"command": "git fetch origin main"
},
{
"type": "script",
"language": "bash",
"content": "cargo check"
},
{
"type": "user_prompt",
"message": "Code review complete. Any issues?"
}
]
}Execute custom scripts via voice:
Bash Script Example:
# Save as deploy.sh
#!/bin/bash
echo "Starting deployment..."
npm run build
docker build -t myapp .
docker run -d myappVoice Command: "deploy application"
Control web browsers programmatically:
{
"action": "browser_navigate",
"url": "https://github.com/myrepo"
}# Initial setup
make setup
# Development modes
make dev # Interactive mode selection (web/listen)
make dev-listen # Start voice listening mode directly
make dev-web # Start web interface mode directly
# Alternative: Use environment variables
VIBESPEAK_MODE=listen make dev # Auto-start voice listening
VIBESPEAK_MODE=web make dev # Auto-start web interface
# Testing & quality
make test # Run all tests
make check # Format, lint, and test
make format # Format code onlyvibespeak/
├── src/
│ ├── domain/ # Business logic & entities
│ ├── application/ # Use cases & services
│ ├── infrastructure/ # External interfaces & adapters
│ │ └── adapters/
│ │ ├── keyboard_simulator.rs # Real keyboard via uinput
│ │ ├── vosk_adapter.rs # Speech recognition
│ │ └── tts_adapter.rs # Text-to-speech
│ ├── presentation/
│ │ ├── axum_server/ # Axum-based web server (new)
│ │ │ ├── handlers/ # Request handlers
│ │ │ ├── routes/ # Route definitions
│ │ │ └── state.rs # Application state
│ │ └── cli/ # Command-line interface
│ └── shared/ # Common utilities & types
├── frontend/ # Leptos CSR frontend (new)
│ ├── src/
│ │ ├── components/ # Reusable UI components
│ │ ├── pages/ # Page components
│ │ ├── api.rs # Backend API client
│ │ └── state.rs # Application state
│ └── Cargo.toml # Frontend dependencies
├── frontend/ # Leptos WASM frontend
├── config/ # Configuration files
├── model/ # Voice recognition models
├── docs/ # Documentation
├── tests/ # Integration tests
└── Makefile # Build automation
Add business rules to src/domain/
Implement use cases in src/application/
Add external integrations in src/infrastructure/
Update web/index.html and API endpoints
Implement in src/domain/services/plugin.rs
# Unit tests
cargo test
# Integration tests
cargo test --test integration
# With coverage (requires tarpaulin)
make test-coverage# Optimized release build
make build
# Create release archive
make release
# Docker deployment
make docker- Install Tailscale: https://tailscale.com/download
- Authenticate:
sudo tailscale up - Configure Vibespeak to bind to Tailscale interface
{
"settings": {
"tailscale_enabled": true,
"web_server_bind": "100.64.0.1:8080"
}
}# Local access
ssh -L 8080:localhost:8080 user@remote-server
# Then access http://localhost:8080# WireGuard or OpenVPN configuration
# Bind Vibespeak to VPN interfaceError: Failed to load Vosk model
Solution:
# Verify model exists
ls -la model/
# Download and extract the recommended model
cd model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip
unzip vosk-model-en-us-0.22-lgraph.zip
# Update config to point to the extracted directoryError: Piper TTS not found. Piper TTS is required for voice synthesis.
Solution:
Piper TTS must be built from source. The automated setup (make setup) handles this, but for manual installation:
# Install required system dependencies
sudo pacman -S cmake fmt spdlog onnxruntime-cpu espeak-ng # Arch Linux
# OR for Ubuntu: sudo apt install cmake libfmt-dev libspdlog-dev onnxruntime libespeak-ng-dev
# Build Piper TTS from source
git clone https://github.com/rhasspy/piper.git
cd piper
mkdir build && cd build
cmake ..
make -j$(nproc)
# Install locally in Vibespeak project
cp piper ../../piper/
cp -r pi/lib/* ../../piper/lib/
cp -r pi/share/* ../../piper/share/
# Download voice model
cd ../../models
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/en/en_US/amy/medium/en_US-amy-medium.onnx.json
# Test Piper
echo "Hello world" | ../../piper/piper --model en_US-amy-medium.onnx --output_file test.wavError: No audio input device available
Solution:
# List available devices
arecord -l
# Configure specific device in config.json
{
"settings": {
"audio_device": "hw:1,0"
}
}Error: Address already in use (os error 98)
Solution:
# Kill process using port 8080
sudo lsof -ti:8080 | xargs kill -9
# Or change port in config
{
"settings": {
"web_server_port": 8081
}
}Error: Permission denied (os error 13)
Solution:
# Run with appropriate permissions
sudo ./target/release/vibespeak
# Or configure user permissions for audio devices
sudo usermod -a -G audio $USERCheck:
# Verify server is running
curl http://localhost:8080/api/config
# Check firewall settings
sudo ufw status
sudo ufw allow 8080- Reduce model size (use smaller Vosk model)
- Disable TTS if not needed
- Lower audio sample rate
- Use smaller voice models
- Disable unused plugins
- Monitor with
htoportop
- Use larger Vosk model
- Improve microphone quality
- Reduce background noise
- Speak clearly and closer to microphone
- Check CPU usage during recognition
- Reduce concurrent processes
- Use wired microphone instead of Bluetooth
# Enable debug logging
RUST_LOG=debug make dev
# View logs
tail -f /tmp/vibespeak.log
# Verbose build
cargo build --verbose┌─────────────────────────────────────┐
│ Presentation Layer │
│ - Leptos CSR Frontend (integrated) │
│ - Axum REST API │
│ - WebSocket │
│ - Single full-stack server │
└─────────────────────────────────────┘
│
┌─────────────────────────────────────┐
│ Application Layer │
│ - Use Cases │
│ - Application Services │
│ - DTOs │
└─────────────────────────────────────┘
│
┌─────────────────────────────────────┐
│ Domain Layer │
│ - Entities │
│ - Value Objects │
│ - Domain Services │
│ - Business Rules │
└─────────────────────────────────────┘
│
┌─────────────────────────────────────┐
│ Infrastructure Layer │
│ - Vosk Adapter (Speech-to-Text) │
│ - TTS Adapter (Piper) │
│ - Keyboard Simulator (evdev/uinput)│
│ - File System │
└─────────────────────────────────────┘
Vibespeak uses real kernel-level keyboard events via Linux's uinput interface, making voice dictation work in any application - not just web browsers.
How it works:
- Voice input is captured and converted to text via Vosk
- Text is sent to the
keyboard_simulatormodule - The module creates a virtual keyboard device via
/dev/uinput - Individual key press/release events are emitted at the kernel level
- The active application receives real keyboard input
Requirements:
- Linux with uinput support (most distributions)
- Permission to access
/dev/uinput:# Temporary (resets on reboot) sudo chmod 666 /dev/uinput # Permanent (recommended) sudo tee /etc/udev/rules.d/99-uinput.rules <<EOF KERNEL=="uinput", MODE="0666", GROUP="input" EOF sudo udevadm control --reload-rules
Fallback: If uinput is unavailable, Vibespeak falls back to xdotool for X11 systems.
Extensible architecture supporting:
- Command Plugins: Custom voice commands
- Workflow Plugins: Complex automation sequences
- Integration Plugins: External service connections
- Script Plugins: Custom script execution engines
- Sandboxed Execution: Restricted script environments
- Trusted Execution: Full system access for approved scripts
- Isolated Execution: Container-based execution for untrusted code
- Permission System: Granular access controls
GET /api/config # Get current configuration
POST /api/config # Update configuration
POST /api/voice/test # Test voice recognition
GET /api/status # System status
GET /api/logs # System logs
// Voice recognition
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === "recognition_result") {
console.log("Recognized:", data.text);
}
};# Fork and clone
git clone https://github.com/yourusername/vibespeak.git
cd vibespeak
# Set up development environment
make setup
make dev-deps
# Create feature branch
git checkout -b feature/your-feature- Rust: Follow official Rust guidelines
- Documentation: Document all public APIs
- Testing: 80%+ code coverage required
- Security: No unsafe code without security review
- Create feature branch
- Write tests for new functionality
- Update documentation
- Run
make checkto ensure quality - Submit PR with detailed description
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: https://github.com/yourusername/vibespeak/issues
- Discussions: https://github.com/yourusername/vibespeak/discussions
- Documentation: https://vibespeak.dev/docs
- Vosk: Open-source offline speech recognition
- Piper TTS: High-quality neural text-to-speech synthesis (built from source)
- ONNX Runtime: Cross-platform ML inference engine
- eSpeak-ng: Phoneme data for speech synthesis
- Tokio: Async runtime for Rust
- Axum: Modern, ergonomic web framework for the backend API
- Leptos: Reactive web framework for the frontend (CSR mode)
- evdev: Linux input device library for real keyboard simulation
- Tailscale: Secure remote access networking
- Chromium: Browser automation engine
Built with love for privacy-focused voice automation