Phoenix

App in the BluixApps catalog

What it is

Arize Phoenix is an LLM observability and tracing platform — OpenTelemetry-native, captures every LLM call (prompt, response, tokens, latency, cost) and visualizes traces for debugging. Built specifically for AI applications where stack traces don't tell you why the model said something stupid.

It's the closest OSS equivalent to LangSmith and Helicone, designed by Arize AI (a major ML observability company).

What it's for

  • LLM debugging — replay traces to understand why responses were wrong
  • Cost analysis — track token spend per prompt template, per user, per feature
  • Latency profiling — find slow chain steps in agent workflows
  • Eval frameworks — run benchmarks on LLM outputs with consistent metrics
  • A/B testing prompts — compare prompt versions on real traffic

Who it's for

  • AI engineers debugging RAG and agent pipelines in production
  • ML observability teams standardizing LLM monitoring across products
  • Product teams running prompt experiments with measurable metrics
  • Compliance / audit teams maintaining LLM call audit trails
  • AI platform teams providing observability as a service to internal teams

Why teams pick Phoenix over alternatives

  • OpenTelemetry-native — standard tracing protocol, integrates with any obs stack
  • Apache 2.0 — fully open, commercial use unrestricted
  • LangChain / LlamaIndex first-class — auto-instrumentation, no manual tracing code
  • Evals built-in — LLM eval framework included
  • Self-hosted — keep prompts + responses on your infrastructure
  • Active development — backed by Arize AI commercial product

Integrations

  • Auto-instrumentation — LangChain, LlamaIndex, OpenAI SDK, LiteLLM, Bedrock
  • OpenTelemetry — send traces from any otel-compatible SDK
  • LLM providers — captures calls to OpenAI, Anthropic, Ollama, any OpenAI-compatible
  • Eval frameworks — Phoenix Evals for built-in benchmarking
  • Datasets API — curate prompt/response datasets for fine-tuning
  • Webhook export — push traces to downstream systems
  • REST API — programmatic access to trace data

Notable users & community

  • 5k+ GitHub stars
  • Adopted by Arize AI customers + LangChain users
  • Active Slack community
  • Featured in LLM observability stack guides
  • Strong roadmap with continuous feature additions

Tips & operations

  • Set up auto-instrumentation early — manual tracing is tedious; auto-instrumentation gives 80% coverage in minutes
  • Mind storage growth — every LLM call captured; cold storage / TTL policies essential at scale
  • Sampling for production — high-traffic apps don't need 100% trace sampling; reduce to 10-50% to control cost
  • Use Eval framework — Phoenix Evals = built-in LLM-as-judge evaluators; saves writing eval code
  • Auth via reverse proxy — Phoenix has minimal built-in auth; protect with Authelia / OAuth proxy
  • Persistent storage — traces are valuable data; mount volume from day one

What we ship in BluixApps

  • Docker compose: Phoenix + Postgres + persistent storage
  • Pinned arizephoenix/phoenix:latest (release-tagged)
  • HTTPS via Let's Encrypt
  • Pre-configured OpenTelemetry endpoint for trace ingestion
  • Auto-detects LiteLLM on same VPS for combined observability stack
  • Persistent volume for trace storage
  • Backup hook covers Postgres + trace exports
Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

TierTenantsCatalogSupportWhite-labelMonthly
Stacks119 curated stacksStandard$19/moDetailDeploy
Starter10Full catalogStandard+$15–25/mo$49/moDetailDeploy
Pro25Full catalogPriority bugfix+$15–25/mo$149/moDetailDeploy
Growth100Full catalogPriority bugfix+$15–25/mo$349/moDetailDeploy
Scale500Full catalog7-day window+$15–25/mo$799/moDetailDeploy
EnterpriseUnlimitedFull catalogPriority 7-dayBundled$1,499/moDetailDeploy

Powered by WHMCompleteSolution