Ollama

App in the BluixApps catalog

What it is

Ollama is the de-facto standard for running local large language models on your own hardware. A single binary + REST API that pulls models from a public registry (Llama 3.3, Mistral, Qwen, DeepSeek, Phi-4 and dozens more), handles quantization, GPU offload, and exposes a simple /api/generate and /api/chat interface that's API-compatible with the OpenAI SDK.

It's the boring, reliable engine that every other self-hosted AI tool ends up integrating against — Open WebUI, AnythingLLM, LibreChat, Flowise, LangChain, LiteLLM, n8n.

What it's for

Private chat assistants — internal company chat that never sends prompts to OpenAI
GDPR-compliant LLM access — EU customers in healthcare, legal, finance who can't push prompts to US clouds
Cost control at scale — predictable per-month VPS bill vs metered API spend
Air-gapped inference — on-prem or restricted-network environments
AI app development backbone — local dev loop for engineers building on top of LLMs

Who it's for

AI developers & ML engineers — fast local dev loop, no API rate limits, no $ per token while building
Privacy-bound enterprises — legal, healthcare, finance, gov teams forbidden from US-hosted LLM APIs
Hosting providers — resellers offering "private AI VPS" to their customers as a higher-margin SKU
Researchers & academics — evaluating open models without paying OpenAI / Anthropic per experiment
Indie SaaS founders — predictable per-month VPS cost beats unpredictable per-token bills as traffic grows

Why teams pick Ollama over alternatives

OpenAI-compatible API — most existing client code works with a URL change
Massive model catalog with one-command pulls (ollama pull llama3.3)
Apache 2.0 license — commercial use unencumbered
CPU-capable for small models (TinyLlama, Phi-3) — runs on a $7/mo VPS for testing
GPU optional but supported (CUDA, ROCm, Apple Metal) when you scale to 7B+ models
Single binary — operational simplicity, no Python venv hell

Integrations

Chat UIs — Open WebUI, AnythingLLM, LibreChat, Khoj all detect Ollama as first-class backend
Workflow builders — n8n + Flowise + Langflow + Typebot have native Ollama nodes
LLM SDKs — LangChain, LlamaIndex, Semantic Kernel, Haystack all support Ollama natively
OpenAI-proxy gateways — LiteLLM proxies Ollama as if it were OpenAI for legacy clients
IDE assistants — Continue.dev, Aider, Cline, Cody let devs hit local Ollama for code completion
Model formats — pulls Hugging Face GGUF directly; Modelfile lets you fork & customize
Embeddings endpoint — /api/embeddings works with Chroma, Qdrant, pgvector RAG stacks

Notable users & community

110k+ GitHub stars, top of awesome-selfhosted AI category
Integrated by Continue.dev, Cline, Aider, LangChain, LlamaIndex, LiteLLM, OpenWebUI as a first-class backend
Active Discord, weekly model drops, strong macOS and Linux maintainer community
Backed by ollama.ai company — sustainable dev model with permissive Apache 2.0 license
Cited in countless "self-host your AI stack" guides on r/selfhosted, r/LocalLLaMA

Tips & operations

Pre-pull models before exposing the service — first request triggers a multi-GB download that times out user calls
Tune OLLAMA_KEEP_ALIVE — default unloads model after 5 min idle; set 1h for warm latency, -1 to keep forever
Verify GPU detection with ollama ps — if model says "100% CPU" you're not using your GPU; check NVIDIA drivers + CUDA toolkit
Never expose Ollama directly — no built-in auth; always behind nginx + basic auth, OAuth proxy, or a chat-UI gateway
Memory budget rule — 7B Q4 ≈ 5 GB, 13B Q4 ≈ 9 GB, 70B Q4 ≈ 40 GB; size VPS accordingly
Disk cleanup — ollama list then ollama rm unused; models silently accumulate in /usr/share/ollama/.ollama/models

What we ship in BluixApps

Docker compose stack: Ollama server + GPU passthrough config (off by default)
Pre-allocated model storage volume at /var/lib/ollama for persistence across upgrades
Pinned ollama/ollama:0.5.4 image, tracked weekly against upstream
HTTP-only by default on 127.0.0.1:11434; SSL + auth via Nginx Proxy Manager when paired with Open WebUI
Sizing guidance shipped in customer docs: 8 GB RAM minimum for 3B models, 16 GB recommended for 7B, GPU required for 13B+
Backup hook captures /var/lib/ollama before each update (models can be 4-20 GB — opt-in)

Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

Tier	Tenants	Catalog	Support	White-label	Monthly
Stacks	1	19 curated stacks	Standard	—	$19/mo	Detail Deploy
Starter	10	Full catalog	Standard	+$15–25/mo	$49/mo	Detail Deploy
Pro	25	Full catalog	Priority bugfix	+$15–25/mo	$149/mo	Detail Deploy
Growth	100	Full catalog	Priority bugfix	+$15–25/mo	$349/mo	Detail Deploy
Scale	500	Full catalog	7-day window	+$15–25/mo	$799/mo	Detail Deploy
Enterprise	Unlimited	Full catalog	Priority 7-day	Bundled	$1,499/mo	Detail Deploy

Ollama

What it is

What it's for

Who it's for

Why teams pick Ollama over alternatives

Integrations

Notable users & community

Tips & operations

What we ship in BluixApps

Get this app — pick a BluixApps plan

BluixApps Stacks — entry tier, single VPS managed

What's included

What's NOT in this tier

Best for

Plan facts

BluixApps Starter — full catalog, up to 10 isolated tenants

What's included

Best for

Where to upgrade from here

Plan facts

BluixApps Pro — 25 isolated tenants, priority bugfix lane

What's included on top of Starter

Best for

Plan facts

BluixApps Growth — 100 tenants, scale-up reseller toolkit

What's included on top of Pro

Best for

Plan facts

BluixApps Scale — 500 tenants, 7-day support window

What's included on top of Growth

Best for

Where to upgrade from here

Plan facts

BluixApps Enterprise — unlimited tenants, white-label bundled

What's included on top of Scale

Best for

Plan facts

Ollama

What it is

What it's for

Who it's for

Why teams pick Ollama over alternatives

Integrations

Notable users & community

Tips & operations

What we ship in BluixApps

Get this app — pick a BluixApps plan

BluixApps Stacks — entry tier, single VPS managed

What's included

What's NOT in this tier

Best for

Plan facts

BluixApps Starter — full catalog, up to 10 isolated tenants

What's included

Best for

Where to upgrade from here

Plan facts

BluixApps Pro — 25 isolated tenants, priority bugfix lane

What's included on top of Starter

Best for

Plan facts

BluixApps Growth — 100 tenants, scale-up reseller toolkit

What's included on top of Pro

Best for

Plan facts

BluixApps Scale — 500 tenants, 7-day support window

What's included on top of Growth

Best for

Where to upgrade from here

Plan facts

BluixApps Enterprise — unlimited tenants, white-label bundled

What's included on top of Scale

Best for

Plan facts

Generate Password

Generate Password