F5tts

App in the BluixApps catalog

What it is

F5-TTS is state-of-the-art zero-shot TTS by Shanghai AI Lab — flow matching + DiT architecture, faster than XTTS-v2 with higher fidelity, voice cloning from 10-second samples. The newer challenger to XTTS in the open TTS leaderboard.

When you need TTS quality that approaches commercial APIs, F5-TTS is the open option.

What it's for

  • High-fidelity TTS — closer to commercial APIs than XTTS
  • Voice cloning from 10-second reference
  • Native English + Chinese — primary languages, with community LoRAs for others
  • Voice chat mode — TTS + ASR loop for conversational systems
  • Faster inference than XTTS — 3-5× real-time on RTX 3090
  • Emotion + prosody control — nuanced delivery options

Who it's for

  • Premium audio content creators demanding closer-to-commercial quality
  • Voice chat product builders (companion AI, assistant interfaces)
  • Audiobook producers for English / Chinese content
  • AI startups integrating high-quality TTS in their stack
  • Hosting providers selling premium voice tier

Why teams pick F5-TTS over alternatives

  • MIT license — fully open
  • Highest quality in late-2024 open TTS benchmarks (matches/beats XTTS for EN/ZH)
  • Faster than XTTS by 2-3×
  • Better emotion control than XTTS
  • Voice chat mode built-in — TTS + ASR ready loop
  • Active research backing — Shanghai AI Lab + community

Integrations

  • Gradio web UI with multi-speech + voice chat tabs
  • Gradio API auto-exposed at /api/predict/0
  • HuggingFace Diffusers-style pipeline
  • Pair with LLM — voice chat with Ollama/vLLM
  • Pair with Whisper — voice chat loop (ASR → LLM → F5-TTS)
  • Community LoRAs for additional languages (Italian, Spanish, French)

Notable users & community

  • 9k+ GitHub stars
  • Shanghai AI Lab + community development
  • Featured in late-2024 TTS leaderboard upsets
  • Active Chinese + English community
  • Multiple commercial integrations starting

Tips & operations

  • Reference voice: 10-30 sec clean recording, low noise
  • English + Chinese native; other languages via community LoRAs
  • VRAM: 8 GB GPU recommended
  • Voice chat loop: enable TTS+ASR tabs for full conversation
  • Storage: model weights ~1.5 GB (lighter than XTTS)
  • Latency: ~200ms first token (production-grade)
  • License clean — MIT, commercial OK
  • Compare with XTTS: F5 wins on quality for EN/ZH; XTTS wins on language coverage

What we ship in BluixApps

  • Cloned SWivid/F5-TTS repo, pip-installed
  • pytorch/pytorch CUDA 12.4 base + ffmpeg
  • Gradio launcher (infer_gradio)
  • Persistent volumes: repo, models (~1.5 GB), output
  • Port 7867 mapped
  • Install report at /root/bluixapps/f5tts.txt
  • Acceptable Use Policy notes (voice cloning ethics)
  • Pairing suggestions (XTTS for language coverage, Whisper for voice chat)
  • GPU pre-flight check via bluixapps_ensure_nvidia_runtime
  • Backup hook covers models + outputs
Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

TierTenantsCatalogSupportWhite-labelMonthly
Stacks119 curated stacksStandard$19/moDetailDeploy
Starter10Full catalogStandard+$15–25/mo$49/moDetailDeploy
Pro25Full catalogPriority bugfix+$15–25/mo$149/moDetailDeploy
Growth100Full catalogPriority bugfix+$15–25/mo$349/moDetailDeploy
Scale500Full catalog7-day window+$15–25/mo$799/moDetailDeploy
EnterpriseUnlimitedFull catalogPriority 7-dayBundled$1,499/moDetailDeploy

Powered by WHMCompleteSolution