Xtts

App in the BluixApps catalog

What it is

XTTS-v2 is Coqui AI's multilingual text-to-speech model — 17 languages, voice cloning from 6-second samples, expressive emotional delivery, streaming output. Industry-leading open TTS, the canonical choice for self-hosted speech synthesis projects.

The voice equivalent of "open SDXL" — best-in-class open weights with permissive commercial terms.

What it's for

  • Multi-lingual TTS — 17 languages from one model
  • Voice cloning — 6-second sample → speech in cloned voice
  • Real-time streaming — chunked audio output, low latency
  • Cross-lingual generation — English speaker → speak in Spanish/French/Italian
  • Emotion-aware delivery — natural prosody, not robotic
  • API server — REST endpoints for programmatic use

Who it's for

  • Podcast producers generating multi-language content
  • Game studios creating character voices
  • Educational platforms narrating content in multiple languages
  • Marketers producing demo videos at scale
  • Accessibility teams auto-narrating articles for screen readers
  • Hosting providers selling voice synthesis services

Why teams pick XTTS-v2 over alternatives

  • MPL-2.0 / CPML license — fully open; commercial OK with attribution
  • 17 languages — broader coverage than F5-TTS, ChatTTS
  • Voice cloning quality — 6-second sample is impressive
  • Streaming server — production-ready API
  • Coqui pedigree — speech-tech veterans (formerly Mozilla DeepSpeech team)
  • Active community — frequent fine-tuned forks for specific languages

Integrations

  • REST API server/tts_stream endpoint for programmatic use
  • WebSocket support for real-time streaming
  • Speaker library — reference voice samples stored persistently
  • HuggingFace integration — model versions tracked
  • Pair with Whisper — speech → text → translate → re-speak in new voice
  • Pair with LLM — text generation → XTTS narration

Notable users & community

  • 33k+ GitHub stars (parent Coqui TTS repo)
  • Coqui AI (founded by ex-Mozilla DeepSpeech team)
  • Industry-standard for open TTS
  • Used in commercial products + academic research
  • Active fine-tuning community on HuggingFace

Tips & operations

  • Reference voice: 6-30 seconds, clean speech, low noise, single speaker
  • Languages supported: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko, hi
  • VRAM: 6 GB GPU optimal, 4 GB CPU fallback works
  • Streaming mode: chunk size affects latency vs throughput tradeoff
  • Speaker storage: /opt/xtts/speakers/ keeps your reference voices
  • Production: reverse proxy + auth, rate limiting via gateway
  • License caveat: voice cloning has misuse potential — disclose AI-generated audio

What we ship in BluixApps

  • Docker (ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121)
  • Persistent volumes: models, output, speakers (reference voices)
  • COQUI_TOS_AGREED=1 + MODEL_NAME pre-set
  • Port 5002 (default XTTS) with Swagger docs at /docs
  • Install report at /root/bluixapps/xtts.txt
  • Acceptable Use Policy noted (no impersonation without consent)
  • Sample API calls for voice cloning + text-to-speech in install report
  • GPU pre-flight check via bluixapps_ensure_nvidia_runtime
  • Backup hook covers speakers + outputs
Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

TierTenantsCatalogSupportWhite-labelMonthly
Stacks119 curated stacksStandard$19/moDetailDeploy
Starter10Full catalogStandard+$15–25/mo$49/moDetailDeploy
Pro25Full catalogPriority bugfix+$15–25/mo$149/moDetailDeploy
Growth100Full catalogPriority bugfix+$15–25/mo$349/moDetailDeploy
Scale500Full catalog7-day window+$15–25/mo$799/moDetailDeploy
EnterpriseUnlimitedFull catalogPriority 7-dayBundled$1,499/moDetailDeploy

Powered by WHMCompleteSolution