Whisper

App in the BluixApps catalog

What it is

Whisper is OpenAI's open-source speech-to-text model — multilingual, robust, with strong handling of accents, background noise, technical vocabulary. The "best free STT in the world" since its 2022 release. The Whisper deployment in BluixApps wraps the model as a REST API server (Whisper-WebUI or Faster-Whisper) for easy integration.

For teams who want OpenAI's transcription quality without OpenAI's per-minute pricing, self-hosted Whisper is the answer.

What it's for

  • Audio transcription — meetings, interviews, podcasts at scale
  • Subtitle generation — auto-caption videos for accessibility / localization
  • Voice command parsing — input layer for voice-controlled apps
  • Audio archive search — transcribe + index for full-text audio search
  • Multi-language transcription — single model handles 100+ languages

Who it's for

  • Media production teams transcribing video / podcast back catalogs
  • Accessibility teams captioning content under ADA / WCAG requirements
  • Privacy-bound apps processing sensitive audio (legal, medical) on-prem
  • AI engineers building voice-input layers for LLM apps
  • Cost-conscious teams moving away from OpenAI / AssemblyAI per-minute billing

Why teams pick self-hosted Whisper over alternatives

  • MIT license — fully open, commercial use unrestricted
  • OpenAI-grade quality — same model OpenAI uses for their paid API
  • Multi-language — 100+ languages, automatic language detection
  • Robust — handles accents, noise, music background better than competitors
  • Hardware flexibility — runs on CPU (slow) or GPU (fast)
  • Multiple model sizes — tiny (39MB) to large (1.5GB) trade off speed vs accuracy

Integrations

  • OpenAI-compatible API — Whisper-WebUI exposes /v1/audio/transcriptions
  • Audio formats — MP3, WAV, M4A, OGG, FLAC, MP4 video via ffmpeg
  • Output formats — JSON (with timestamps), SRT, VTT, plain text
  • Word-level timestamps — for precise subtitle / search applications
  • Translation mode — transcribe non-English audio to English text
  • Webhook — async transcription completion callbacks
  • Python / JS SDKs — via OpenAI client pointed at Whisper endpoint

Notable users & community

  • 70k+ GitHub stars on openai/whisper
  • Forks: Faster-Whisper (CTranslate2-optimized), WhisperX (alignment), distil-whisper
  • Featured in countless self-hosted media / accessibility guides
  • Strong community across r/LocalLLaMA, r/selfhosted
  • Foundation of every modern open-source TTS/STT stack

Tips & operations

  • GPU strongly recommended for large — CPU large-model transcription = unusably slow
  • Use Faster-Whisper for production — CTranslate2-based, 4× faster than vanilla Whisper
  • Pre-download models — bake into image to avoid first-request delays
  • Disable VAD for clean audio — Voice Activity Detection adds overhead on clean podcast audio
  • Batch transcription — concurrent requests on GPU; serialize to avoid OOM
  • Model size trade-off — base/small for real-time, medium/large for quality

What we ship in BluixApps

  • Docker compose: Faster-Whisper server (CTranslate2-optimized) + model cache
  • Pinned image tracking latest stable release
  • HTTPS via Let's Encrypt; API key auth via proxy
  • OpenAI-compatible /v1/audio/transcriptions endpoint
  • Whisper-base + medium pre-downloaded to avoid first-request delay
  • GPU passthrough optional (3-5× faster than CPU)
  • Stateless service — no backup needed beyond config
Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

TierTenantsCatalogSupportWhite-labelMonthly
Stacks119 curated stacksStandard$19/moDetailDeploy
Starter10Full catalogStandard+$15–25/mo$49/moDetailDeploy
Pro25Full catalogPriority bugfix+$15–25/mo$149/moDetailDeploy
Growth100Full catalogPriority bugfix+$15–25/mo$349/moDetailDeploy
Scale500Full catalog7-day window+$15–25/mo$799/moDetailDeploy
EnterpriseUnlimitedFull catalogPriority 7-dayBundled$1,499/moDetailDeploy

Powered by WHMCompleteSolution