back to home

jamiepine / voicebox

The open-source voice synthesis studio powered by Qwen3-TTS.

10,006 stars
1,049 forks
115 issues
TypeScriptPythonRust

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing jamiepine/voicebox in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind-ai.vercel.app/repo/jamiepine/voicebox)
Preview:Analyzed by RepoMind

Repository Summary (README)

Preview

Voicebox

Voicebox

The open-source voice synthesis studio.
Clone voices. Generate speech. Build voice-powered apps.
All running locally on your machine.

Downloads Release Stars License

voicebox.shDownloadFeaturesAPIRoadmap


Voicebox App Screenshot

Click the image above to watch the demo video on voicebox.sh


Voicebox Screenshot 2

Voicebox Screenshot 3


What is Voicebox?

Voicebox is a local-first voice cloning studio with DAW-like features for professional voice synthesis. Think of it as a local, free and open-source alternative to ElevenLabs — download models, clone voices, and generate speech entirely on your machine.

Unlike cloud services that lock your voice data behind subscriptions, Voicebox gives you:

  • Complete privacy — models and voice data stay on your machine
  • Professional tools — multi-track timeline editor, audio trimming, conversation mixing
  • Model flexibility — currently powered by Qwen3-TTS, with support for XTTS, Bark, and other models coming soon
  • API-first — use the desktop app or integrate voice synthesis into your own projects
  • Native performance — built with Tauri (Rust), not Electron
  • Super fast on Mac — MLX backend with native Metal acceleration for 4-5x faster inference on Apple Silicon

Download a voice model, clone any voice from a few seconds of audio, and compose multi-voice projects with studio-grade editing tools. No Python install required, no cloud dependency, no limits.


Download

Voicebox is available now for macOS and Windows.

PlatformDownload
macOS (Apple Silicon)voicebox_aarch64.app.tar.gz
macOS (Intel)voicebox_x64.app.tar.gz
Windows (MSI)voicebox_0.1.0_x64_en-US.msi
Windows (Setup)voicebox_0.1.0_x64-setup.exe

Linux builds coming soon — Currently blocked by GitHub runner disk space limitations.


Features

Voice Cloning with Qwen3-TTS

Powered by Alibaba's Qwen3-TTS — a breakthrough model that achieves near-perfect voice cloning from just a few seconds of audio.

  • Instant cloning — Upload a sample, get a voice profile
  • High fidelity — Natural prosody, emotion, and cadence
  • Multi-language — English, Chinese, and more coming
  • Lightning fast on Mac — MLX backend leverages Apple Silicon's Neural Engine for super fast generation

Voice Profile Management

  • Create profiles from audio files or record directly in-app
  • Import/Export profiles to share or backup
  • Multi-sample support — combine multiple samples for higher quality cloning
  • Organize with descriptions and language tags

Speech Generation

  • Text-to-speech with any cloned voice
  • Batch generation for long-form content
  • Smart caching — regenerate instantly with voice prompt caching

Stories Editor

Create multi-voice narratives, podcasts, and conversations with a timeline-based editor.

  • Multi-track composition — arrange multiple voice tracks in a single project
  • Inline audio editing — trim and split clips directly in the timeline
  • Auto-playback — preview stories with synchronized playhead
  • Voice mixing — build conversations with multiple participants

Recording & Transcription

  • In-app recording with waveform visualization
  • System audio capture — record desktop audio on macOS and Windows
  • Automatic transcription powered by Whisper
  • Export recordings in multiple formats

Generation History

  • Full history of all generated audio
  • Search & filter by voice, text, or date
  • Re-generate any past generation with one click

Flexible Deployment

  • Local mode — Everything runs on your machine
  • Remote mode — Connect to a GPU server on your network
  • One-click server — Turn any machine into a Voicebox server

API

Voicebox exposes a full REST API, so you can integrate voice synthesis into your own apps.

# Generate speech
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "profile_id": "abc123", "language": "en"}'

# List voice profiles
curl http://localhost:8000/profiles

# Create a profile
curl -X POST http://localhost:8000/profiles \
  -H "Content-Type: application/json" \
  -d '{"name": "My Voice", "language": "en"}'

Use cases:

  • Game dialogue systems
  • Podcast/video production pipelines
  • Accessibility tools
  • Voice assistants
  • Content creation automation

Full API documentation available at http://localhost:8000/docs when running.


Tech Stack

LayerTechnology
Desktop AppTauri (Rust)
FrontendReact, TypeScript, Tailwind CSS
StateZustand, React Query
BackendFastAPI (Python)
Voice ModelQwen3-TTS (PyTorch or MLX)
TranscriptionWhisper (PyTorch or MLX)
Inference EngineMLX (Apple Silicon) / PyTorch (Windows/Linux/Intel)
DatabaseSQLite
AudioWaveSurfer.js, librosa

Why this stack?

  • Tauri over Electron — 10x smaller bundle, native performance, lower memory
  • FastAPI — Async Python with automatic OpenAPI schema generation
  • Type-safe end-to-end — Generated TypeScript client from OpenAPI spec

Roadmap

Voicebox is the beginning of something bigger. Here's what's coming:

Coming Soon

FeatureDescription
Real-time SynthesisStream audio as it generates, word by word
Conversation ModeMulti-speaker dialogues with automatic turn-taking
Voice EffectsPitch shift, reverb, M3GAN-style effects
Timeline EditorAudio studio with word-level precision editing
More ModelsXTTS, Bark, and other open-source voice models

Future Vision

  • Voice Design — Create new voices from text descriptions
  • Project System — Save and load complex multi-voice sessions
  • Plugin Architecture — Extend with custom models and effects
  • Mobile Companion — Control Voicebox from your phone

Voicebox aims to be the one-stop shop for everything voice — cloning, synthesis, editing, effects, and beyond.


Development

See CONTRIBUTING.md for detailed setup and contribution guidelines.

Using the Makefile (recommended): Run make help to see all available commands for setup, development, building, and testing.

Quick Start

With Makefile (Unix/macOS/Linux):

# Clone the repo
git clone https://github.com/jamiepine/voicebox.git
cd voicebox

# Setup everything
make setup

# Start development
make dev

Manual setup (all platforms):

# Clone the repo
git clone https://github.com/jamiepine/voicebox.git
cd voicebox

# Install dependencies
bun install

# Install Python dependencies
cd backend && pip install -r requirements.txt && cd ..

# Start development
bun run dev

Prerequisites: Bun, Rust, Python 3.11+. XCode on macOS.

Performance:

  • Apple Silicon (M1/M2/M3): Uses MLX backend with native Metal acceleration for 4-5x faster inference
  • Windows/Linux/Intel Mac: Uses PyTorch backend (CUDA GPU recommended, CPU supported but slower)

Project Structure

voicebox/
├── app/              # Shared React frontend
├── tauri/            # Desktop app (Tauri + Rust)
├── web/              # Web deployment
├── backend/          # Python FastAPI server
├── landing/          # Marketing website
└── scripts/          # Build & release scripts

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

  1. Fork the repo
  2. Create a feature branch
  3. Make your changes
  4. Submit a PR

Security

Found a security vulnerability? Please report it responsibly. See SECURITY.md for details.


License

MIT License — see LICENSE for details.


voicebox.sh