Infinity

App in the BluixApps catalog

What it is

Infinity Embedding is a high-throughput embedding inference server — REST API serving text and image embeddings via models like BGE, E5, Jina, Cohere, and more. OpenAI-compatible /v1/embeddings endpoint makes it a drop-in replacement for the OpenAI embeddings API.

5-20× faster than HuggingFace Inference for embeddings, designed for production RAG pipelines.

What it's for

Embedding inference at scale — for RAG, search, recommendations
OpenAI-compatible API — drop-in replacement for OpenAI embeddings
Multi-model serving — multiple embedding models in one container
High throughput — batching + tensor parallelism
Long-document embeddings — Jina v3 supports 8k+ tokens
Multilingual embeddings — BGE-M3, multilingual-e5

Who it's for

RAG pipeline builders needing embeddings at scale
Search teams building semantic search
AI app developers integrating embeddings in their stack
AI agencies offering embedding services to clients
Hosting providers selling embedding API tier

Why teams pick Infinity over alternatives

MIT license — fully open
5-20× faster than HuggingFace Inference
OpenAI-compatible — works with LangChain, LlamaIndex, etc.
Multi-model — serve multiple embedding models simultaneously
Active development — Michael Feil maintains
Production-tested — used by AI startups in prod
GPU + CPU — gracefully degrades to CPU

Integrations

OpenAI v1: /v1/embeddings endpoint
Reranker support — rerank documents post-retrieval
Pair with: Qdrant / Weaviate / Chroma (vector stores)
Pair with: vLLM / Ollama (RAG completion)
Pair with: LangChain / LlamaIndex (orchestration)
Swagger UI at /docs

Notable users & community

2k+ GitHub stars (newer but rapidly growing)
Michael Feil + contributors
Featured in production RAG roundups
Active community feedback + integrations
Multiple AI startups in production

Tips & operations

Recommended models by use case:
- General English: BAAI/bge-large-en-v1.5 (1024 dim, default)
- General English (lighter): BAAI/bge-base-en-v1.5 (768 dim)
- Multilingual (100+ languages): intfloat/multilingual-e5-large or BAAI/bge-m3
- Code: Salesforce/codet5p-embedding
- Tiny + fast: sentence-transformers/all-MiniLM-L6-v2
- Long docs (8k+ tokens): jinaai/jina-embeddings-v3
Multi-model: start with --model-id A --model-id B for parallel
VRAM: 4 GB minimum for distilled; 8 GB for large; 16 GB for jina-v3
Speed: 5-20× higher throughput than vanilla HF
vs OpenAI API: free + private + no rate limit + multi-model
vs sentence-transformers: 10× faster batch processing
Production: reverse proxy + auth + monitoring (Prometheus metrics)

What we ship in BluixApps

Docker (michaelf34/infinity:latest)
Default model: BAAI/bge-large-en-v1.5 (configurable via /opt/infinity/.env)
Persistent volume: HF cache (~1-2 GB per model)
Port 7884 (Infinity default 7997)
Swagger UI at /docs
Install report at /root/bluixapps/infinity.txt
Recommended model list by use case
Multi-model serving guide
Infinity vs alternatives comparison
Use case examples (BluixApps catalog search, RAG pipelines)
Pairing suggestions (Qdrant + vLLM + LangChain)
HF_TOKEN environment variable for gated models
GPU pre-flight check via bluixapps_ensure_nvidia_runtime
Backup hook covers HF cache

Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

Tier	Tenants	Catalog	Support	White-label	Monthly
Stacks	1	19 curated stacks	Standard	—	$19/mo	Detail Deploy
Starter	10	Full catalog	Standard	+$15–25/mo	$49/mo	Detail Deploy
Pro	25	Full catalog	Priority bugfix	+$15–25/mo	$149/mo	Detail Deploy
Growth	100	Full catalog	Priority bugfix	+$15–25/mo	$349/mo	Detail Deploy
Scale	500	Full catalog	7-day window	+$15–25/mo	$799/mo	Detail Deploy
Enterprise	Unlimited	Full catalog	Priority 7-day	Bundled	$1,499/mo	Detail Deploy

Infinity

What it is

What it's for

Who it's for

Why teams pick Infinity over alternatives

Integrations

Notable users & community

Tips & operations

What we ship in BluixApps

Get this app — pick a BluixApps plan

BluixApps Stacks — entry tier, single VPS managed

What's included

What's NOT in this tier

Best for

Plan facts

BluixApps Starter — full catalog, up to 10 isolated tenants

What's included

Best for

Where to upgrade from here

Plan facts

BluixApps Pro — 25 isolated tenants, priority bugfix lane

What's included on top of Starter

Best for

Plan facts

BluixApps Growth — 100 tenants, scale-up reseller toolkit

What's included on top of Pro

Best for

Plan facts

BluixApps Scale — 500 tenants, 7-day support window

What's included on top of Growth

Best for

Where to upgrade from here

Plan facts

BluixApps Enterprise — unlimited tenants, white-label bundled

What's included on top of Scale

Best for

Plan facts

Infinity

What it is

What it's for

Who it's for

Why teams pick Infinity over alternatives

Integrations

Notable users & community

Tips & operations

What we ship in BluixApps

Get this app — pick a BluixApps plan

BluixApps Stacks — entry tier, single VPS managed

What's included

What's NOT in this tier

Best for

Plan facts

BluixApps Starter — full catalog, up to 10 isolated tenants

What's included

Best for

Where to upgrade from here

Plan facts

BluixApps Pro — 25 isolated tenants, priority bugfix lane

What's included on top of Starter

Best for

Plan facts

BluixApps Growth — 100 tenants, scale-up reseller toolkit

What's included on top of Pro

Best for

Plan facts

BluixApps Scale — 500 tenants, 7-day support window

What's included on top of Growth

Best for

Where to upgrade from here

Plan facts

BluixApps Enterprise — unlimited tenants, white-label bundled

What's included on top of Scale

Best for

Plan facts

Generate Password

Generate Password