Whisperx

App in the BluixApps catalog

What it is

WhisperX is the production-grade enhancement of OpenAI Whisper — adds 70× real-time inference speed, word-level timestamps via forced alignment, and speaker diarization via pyannote-audio. The standard for serious transcription + subtitling pipelines.

Where vanilla Whisper is research code, WhisperX is the engineering-grade version.

What it's for

  • Audio/video transcription at production speed
  • Multi-speaker labelling (who said what)
  • Word-level timestamps for precise subtitle generation
  • VAD pre-processing — skip silence, faster results
  • Batch processing — entire podcasts, full meetings
  • Multilingual — 99 languages from Whisper backbone

Who it's for

  • Podcast producers generating multi-speaker transcripts
  • Video platforms auto-generating SRT/VTT subtitles
  • Call center analytics transcribing customer calls at scale
  • Meeting note systems (Zoom transcripts, Teams summaries)
  • Accessibility teams captioning content
  • AI app developers building voice-to-text pipelines

Why teams pick WhisperX over alternatives

  • BSD-4-Clause — fully open
  • Faster than vanilla Whisper by 10-70× (batched + VAD + faster-whisper backend)
  • Word-level timestamps — forced alignment via wav2vec2
  • Diarization via pyannote-audio integration
  • Production-tested by major podcasts, transcription services
  • Better than commercial APIs for many use cases at zero per-minute cost

Integrations

  • REST API server — POST audio → JSON/SRT/VTT
  • Whisper backbone — uses faster-whisper or original Whisper
  • pyannote-audio for speaker diarization
  • VAD — Silero or pyannote VAD preprocessor
  • Pair with: Ollama/vLLM for "transcribe → summarize" pipelines
  • Output formats: JSON, SRT, VTT, TSV, TXT

Notable users & community

  • 17k+ GitHub stars (parent WhisperX)
  • Used by podcast platforms (Podscribe-style services)
  • Featured in major transcription tooling roundups
  • Active research community + commercial integrations
  • Multiple production-tested deployments

Tips & operations

  • VRAM: 6 GB GPU for large-v3 model; 4 GB CPU fallback works
  • Speed: 70× real-time on RTX 4090; 10-30× on RTX 3060
  • Diarization: requires HF_TOKEN + accept pyannote terms
  • VAD preprocessing: enables "skip silence" — massive speed-up on podcasts
  • Language detection: automatic or specify language for accuracy
  • Output formats:
    • JSON: full timestamps + speaker labels
    • SRT/VTT: ready for video players
    • TSV: tabular for spreadsheets
  • Best languages: EN, ES, FR, DE, IT, PT, RU, ZH, JA
  • Production: batch via API, async queue, file upload limit ~100 MB

What we ship in BluixApps

  • Docker (ghcr.io/jim60105/whisperx:latest) with NVIDIA runtime
  • Persistent volumes: audio (input), output (transcripts), models cache
  • Port 8002 mapped
  • HF_TOKEN environment variable for diarization
  • Install report at /root/bluixapps/whisperx.txt
  • Sample curl commands for transcription + diarization
  • Output format selection guide
  • Use case examples (podcast, video subtitles, meetings)
  • Pairing suggestions (LLM for summarization)
  • GPU pre-flight check via bluixapps_ensure_nvidia_runtime
  • Backup hook covers audio + output + models cache
Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

TierTenantsCatalogSupportWhite-labelMonthly
Stacks119 curated stacksStandard$19/moDetailDeploy
Starter10Full catalogStandard+$15–25/mo$49/moDetailDeploy
Pro25Full catalogPriority bugfix+$15–25/mo$149/moDetailDeploy
Growth100Full catalogPriority bugfix+$15–25/mo$349/moDetailDeploy
Scale500Full catalog7-day window+$15–25/mo$799/moDetailDeploy
EnterpriseUnlimitedFull catalogPriority 7-dayBundled$1,499/moDetailDeploy

Powered by WHMCompleteSolution