Aphrodite

App in the BluixApps catalog

What it is

Aphrodite Engine is a vLLM fork by Pygmalion AI that adds advanced sampling methods (top-a, min-p, mirostat, smoothing factor), broader quantization (EXL2, GGUF, AQLM, SqueezeLLM), and KoboldAI API compatibility. Designed for roleplay, creative writing, and exploration scenarios that need finer sampling control than vanilla vLLM provides.

What it's for

  • Creative writing pipelines — advanced samplers for varied output
  • Roleplay AI — preserving character voice across long conversations
  • GGUF / EXL2 quantization support (more than vLLM)
  • Triple-API compatibility — OpenAI + KoboldAI + native
  • Karras schedulers — alternative sampling distributions
  • Mirostat / smoothing — target perplexity sampling

Who it's for

  • AI roleplay platforms (character.ai-style)
  • Interactive fiction creators needing varied LLM output
  • Pygmalion AI community members and their products
  • Power users wanting more sampler control than vLLM
  • Researchers exploring novel sampling methods

Why teams pick Aphrodite over alternatives

  • AGPL-3.0 — fully open
  • Advanced samplers not available in vanilla vLLM:
    • min_p (modern alternative to top_p)
    • top_a (probability-shaped truncation)
    • tau / Mirostat (perplexity-based)
    • smoothing_factor (logit smoothing)
  • GGUF + EXL2 quantization — broader than vLLM's GPTQ/AWQ
  • KoboldAI API — drop-in for SillyTavern, KoboldHorde, RisuAI
  • Pygmalion community models work natively

Integrations

  • OpenAI v1: /v1/chat/completions, /v1/completions
  • KoboldAI: /api/v1/generate (for SillyTavern, RisuAI, etc.)
  • Aphrodite native: /v1/internal/* for advanced samplers
  • Quantization: GGUF (llama.cpp), EXL2 (ExLlamaV2), AWQ, GPTQ, SqueezeLLM, Bitsandbytes, AQLM
  • Pair with: SillyTavern (canonical roleplay UI), Pygmalion-tuned models
  • Multi-GPU: --tensor-parallel-size N

Notable users & community

  • 1.5k+ GitHub stars
  • PygmalionAI community + commercial Pygmalion service
  • Used in roleplay AI platforms
  • Active development by Alpin + contributors
  • Featured in r/LocalLLaMA roleplay sub-communities

Tips & operations

  • GGUF for diverse hardware: works on consumer GPUs without modern features
  • EXL2 for speed: fastest quantization format, ExLlamaV2 lineage
  • Sampler combos for RP:
    • min_p: 0.05, top_a: 0.0, temperature: 0.8, smoothing_factor: 0.3
  • Mirostat: target perplexity sampling, set mirostat: 1, mirostat_tau: 5
  • Multi-shard: tensor parallel like vLLM
  • vs vLLM: same core engine, Aphrodite adds samplers + GGUF/EXL2
  • vs TGI: Aphrodite for RP/creative, TGI for HF integration

What we ship in BluixApps

  • Docker (alpindale/aphrodite-engine:latest)
  • Default model: NousResearch/Meta-Llama-3.1-8B-Instruct (configurable)
  • Persistent volume: /opt/aphrodite/models (HF cache)
  • Port 2242 (Aphrodite default)
  • --launch-kobold-api for SillyTavern/RisuAI compatibility
  • Install report at /root/bluixapps/aphrodite.txt
  • Sample API call with advanced samplers (min_p, smoothing_factor)
  • Quantization format guide
  • HF_TOKEN environment variable
  • GPU pre-flight check via bluixapps_ensure_nvidia_runtime
  • Backup hook covers model cache
Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

TierTenantsCatalogSupportWhite-labelMonthly
Stacks119 curated stacksStandard$19/moDetailDeploy
Starter10Full catalogStandard+$15–25/mo$49/moDetailDeploy
Pro25Full catalogPriority bugfix+$15–25/mo$149/moDetailDeploy
Growth100Full catalogPriority bugfix+$15–25/mo$349/moDetailDeploy
Scale500Full catalog7-day window+$15–25/mo$799/moDetailDeploy
EnterpriseUnlimitedFull catalogPriority 7-dayBundled$1,499/moDetailDeploy

Powered by WHMCompleteSolution