Docling

App in the BluixApps catalog

What it is

Docling is IBM's document conversion library that transforms PDFs, DOCX, PPTX, HTML into structured Markdown or JSON. Layout-aware OCR, table detection, image extraction, formula recognition — built specifically for RAG preprocessing where document structure matters.

The MIT-licensed open-source release is the same engine IBM uses in its enterprise AI offerings — high-quality output that captures semantic structure, not just plain text.

What it's for

RAG preprocessing — convert your PDF library into clean Markdown for embedding
Document digitization — OCR scanned documents with layout preserved
Knowledge base ingestion — extract structured content from messy enterprise docs
Compliance archival — convert physical documents to searchable format
Content migration — DOCX → Markdown for static site generators

Who it's for

AI engineers building RAG pipelines over real-world PDF corpora
Knowledge management teams digitizing legacy document archives
Legal & compliance converting contract PDFs into searchable Markdown
Researchers extracting structured data from scientific papers
Tech writers migrating documentation from Word/PDF to Markdown

Why teams pick Docling over alternatives

Layout-aware — preserves table structure, headers, lists (vs simple text extraction)
OCR built-in — handles scanned PDFs with Tesseract integration
Formula recognition — STEM papers with equations stay intact
Apache 2.0 — IBM-backed but fully open
Python-first — clean API, easy to integrate
Output flexibility — Markdown, JSON, with optional structured metadata

Integrations

Python API — primary interface; pip install and go
HTTP API mode — Docling-Serve wrapper exposes REST endpoint
OCR engines — Tesseract, EasyOCR pluggable
PDF parsers — pdfium, PyMuPDF backends
LLM frameworks — LangChain document loader available
Output formats — Markdown, JSON, DocLayNet structured format
Embedded image handling — extract or inline as base64

Notable users & community

20k+ GitHub stars
Backed by IBM Research with active engineering team
Featured in IBM's enterprise AI stack
Strong adoption in research / academic RAG pipelines
Growing community around document AI use cases

Tips & operations

Use HTTP mode for multi-language stacks — embedded Python only for Python apps; REST works for any client
Pre-warm models — first request downloads several hundred MB of model weights; bake into image
OCR vs text extraction — disable OCR for born-digital PDFs; saves 10× processing time
Batch processing — Docling can handle multiple docs per request; batch when possible
GPU acceleration — optional but significantly speeds OCR on scanned doc archives
Output cleanup — Docling Markdown can need light post-processing for LLM ingestion

What we ship in BluixApps

Docker compose: Docling-Serve HTTP wrapper
Pinned quay.io/ds4sd/docling-serve:latest (release-tagged)
HTTPS via Let's Encrypt; API key auth enabled
OCR enabled by default with Tesseract
Persistent model cache volume to avoid re-download on restart
API rate limiting configured for fair use
Backup not needed (stateless service)

Read this app's deep dive on bluix.app ↗

Get this app — pick a BluixApps plan

Same catalog. Scaling tenant isolation, white-label and support tier.

Tier	Tenants	Catalog	Support	White-label	Monthly
Stacks	1	19 curated stacks	Standard	—	$19/mo	Detail Deploy
Starter	10	Full catalog	Standard	+$15–25/mo	$49/mo	Detail Deploy
Pro	25	Full catalog	Priority bugfix	+$15–25/mo	$149/mo	Detail Deploy
Growth	100	Full catalog	Priority bugfix	+$15–25/mo	$349/mo	Detail Deploy
Scale	500	Full catalog	7-day window	+$15–25/mo	$799/mo	Detail Deploy
Enterprise	Unlimited	Full catalog	Priority 7-day	Bundled	$1,499/mo	Detail Deploy

Docling

What it is

What it's for

Who it's for

Why teams pick Docling over alternatives

Integrations

Notable users & community

Tips & operations

What we ship in BluixApps

Get this app — pick a BluixApps plan

BluixApps Stacks — entry tier, single VPS managed

What's included

What's NOT in this tier

Best for

Plan facts

BluixApps Starter — full catalog, up to 10 isolated tenants

What's included

Best for

Where to upgrade from here

Plan facts

BluixApps Pro — 25 isolated tenants, priority bugfix lane

What's included on top of Starter

Best for

Plan facts

BluixApps Growth — 100 tenants, scale-up reseller toolkit

What's included on top of Pro

Best for

Plan facts

BluixApps Scale — 500 tenants, 7-day support window

What's included on top of Growth

Best for

Where to upgrade from here

Plan facts

BluixApps Enterprise — unlimited tenants, white-label bundled

What's included on top of Scale

Best for

Plan facts

Docling

What it is

What it's for

Who it's for

Why teams pick Docling over alternatives

Integrations

Notable users & community

Tips & operations

What we ship in BluixApps

Get this app — pick a BluixApps plan

BluixApps Stacks — entry tier, single VPS managed

What's included

What's NOT in this tier

Best for

Plan facts

BluixApps Starter — full catalog, up to 10 isolated tenants

What's included

Best for

Where to upgrade from here

Plan facts

BluixApps Pro — 25 isolated tenants, priority bugfix lane

What's included on top of Starter

Best for

Plan facts

BluixApps Growth — 100 tenants, scale-up reseller toolkit

What's included on top of Pro

Best for

Plan facts

BluixApps Scale — 500 tenants, 7-day support window

What's included on top of Growth

Best for

Where to upgrade from here

Plan facts

BluixApps Enterprise — unlimited tenants, white-label bundled

What's included on top of Scale

Best for

Plan facts

Generate Password

Generate Password