Speaches

App in the BluixApps catalog

What it is

Speaches is a self-hosted speech-to-text (STT) and text-to-speech (TTS) server with OpenAI-compatible API. Wraps Whisper (STT), Piper / Kokoro (TTS), and exposes them as the standard /v1/audio/transcriptions and /v1/audio/speech OpenAI endpoints.

Drop-in replacement for OpenAI Whisper API at $0/transcription — runs on your own VPS or GPU.

What it's for

  • Self-hosted transcription — replace OpenAI Whisper API with predictable VPS cost
  • Voice assistant TTS — synthesize speech for self-hosted Alexa-style apps
  • Audio content production — bulk transcribe podcasts, meetings, lectures
  • Real-time streaming STT — live captions, voice control
  • Multi-language speech — Whisper handles 100+ languages out of the box

Who it's for

  • Indie SaaS founders building voice features without OpenAI per-minute costs
  • Podcasters bulk transcribing back catalogs without metered API spend
  • Privacy-bound apps needing voice processing without cloud upload
  • Voice assistant developers building self-hosted Alexa/Google Home alternatives
  • AI engineers integrating voice into LLM agents (Open WebUI, LibreChat)

Why teams pick Speaches over alternatives

  • OpenAI-compatible API — every OpenAI Whisper SDK works pointing at Speaches
  • Multiple model sizes — Whisper tiny / base / small / medium / large
  • CPU + GPU support — runs on modest hardware for testing, scales with GPU
  • MIT license — commercial use unrestricted
  • Streaming support — real-time transcription for live audio
  • Active development — frequent releases tracking upstream Whisper

Integrations

  • OpenAI SDKs — Python, JS, every official OpenAI client works
  • STT engines — Whisper (multiple sizes), Faster-Whisper (optimized)
  • TTS engines — Piper (fast), Kokoro (quality)
  • Audio formats — MP3, WAV, M4A, OGG, FLAC input; WAV / MP3 output
  • VAD — Voice Activity Detection for streaming
  • Webhook support — async transcription completion callbacks
  • HTTP REST — primary API surface

Notable users & community

  • 5k+ GitHub stars (rapidly growing)
  • Featured in self-hosted voice AI guides
  • Active Discord community
  • Strong adoption in privacy-bound voice applications
  • Frequent releases matching OpenAI API evolution

Tips & operations

  • GPU strongly recommended for large — Whisper-large on CPU = unusable; medium acceptable on modern CPU
  • Pre-download models — first request downloads model; bake into image to avoid stalls
  • Audio format conversion — Speaches transcodes via ffmpeg; some formats need explicit re-encoding
  • Mind disk usage — Whisper models: tiny 39MB, base 74MB, small 244MB, medium 769MB, large 1.5GB
  • Streaming has GPU overhead — VAD + chunking add latency on CPU
  • Auth at proxy layer — Speaches has no built-in auth; protect with API key proxy

What we ship in BluixApps

  • Docker compose: Speaches server + model cache volume
  • Pinned ghcr.io/speaches-ai/speaches:latest (release-tagged)
  • HTTPS via Let's Encrypt; API key auth via proxy
  • Whisper-base + Kokoro voices pre-downloaded
  • GPU passthrough optional (significantly faster for large models)
  • OpenAI-compatible /v1/audio/transcriptions + /v1/audio/speech endpoints
  • Stateless service — no backup needed beyond config
Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

TierTenantsCatalogSupportWhite-labelMonthly
Stacks119 curated stacksStandard$19/moDetailDeploy
Starter10Full catalogStandard+$15–25/mo$49/moDetailDeploy
Pro25Full catalogPriority bugfix+$15–25/mo$149/moDetailDeploy
Growth100Full catalogPriority bugfix+$15–25/mo$349/moDetailDeploy
Scale500Full catalog7-day window+$15–25/mo$799/moDetailDeploy
EnterpriseUnlimitedFull catalogPriority 7-dayBundled$1,499/moDetailDeploy

Powered by WHMCompleteSolution