Services
What each of the nine Fabrik services does, how they depend on each other, and how to tell when one is unhealthy.
The Fabrik stack is nine long-running containers. They share one Docker image where possible, one Docker network, and one .env file. This page walks each service in order of dependency — the ones lower in the chain have to be up before the ones above start cleanly.
Dependency graph
┌──────────────────────┐
│ postgres (required)│
│ neo4j (required)│
│ redis (required)│
│ rabbitmq (required)│
└──────────┬───────────┘
│ healthcheck gates
┌──────────┴──────────────────────────────┐
│ backend · celery-worker · celery-beat │
│ event-consumer-job/workflow/output │
└──────────┬──────────────────────────────┘
│
┌────┴────┐
│ frontend│
└─────────┘Compose uses depends_on: condition: service_healthy, which means the Python services won't even try to start until Postgres reports pg_isready, Redis replies to PING, RabbitMQ passes rabbitmq-diagnostics ping, and Neo4j responds on port 7474. If one data service is slow to come up, the stack waits — it doesn't fail.
Data services
postgres
Image: postgres:17-alpine · Memory: 512 MB · Volume: fabrik_postgres_data
Primary relational store for everything non-graph: users, groups, saved queries, scheduled tasks, AWX templates, audit logs, time machine snapshot metadata. Exposed on 127.0.0.1:5432 so the host can inspect it — not the public network.
Healthcheck: pg_isready -U fabrik every 10 s. If this ever reports unhealthy, every Python service will follow within a minute.
Backup: pg_dump against the container (see Upgrading and backup).
neo4j
Image: neo4j:5.26 · Memory: 2 GB · Volumes: fabrik_neo4j_data, fabrik_neo4j_logs
Graph database storing the ACI MIM — class hierarchy, containment rules, property definitions. Populated by the backend on first boot from the MIM registry matching APIC_VERSION, or by explicit MIM imports triggered from the admin UI.
Heap and page cache: NEO4J_HEAP_MAX_SIZE (default 1 GB) and NEO4J_PAGECACHE_SIZE (default 256 MB) together dictate Neo4j's RSS. The 2 GB container limit has about 750 MB of headroom for other JVM needs — don't tune these past that.
Healthcheck: HTTP check against http://localhost:7474 with a 30 s startup grace period. Neo4j takes the longest to warm up; that's normal.
redis
Image: redis:8-alpine · Memory: 256 MB · Volume: fabrik_redis_data
Three overlapping roles:
- Celery broker (
/0) — task queue for backend → worker dispatch. - Celery result backend (
/1) — short-lived result storage. - Django Channels layer — WebSocket group membership and message routing.
- MIM cache — short-TTL responses from Neo4j (see the cache tiers in
backend/mim/cache.py).
No authentication by default — Redis is only reachable from other containers on fabrik-network.
Healthcheck: redis-cli ping.
rabbitmq
Image: rabbitmq:4.1-management-alpine · Memory: 1 GB · Volume: fabrik_rabbitmq_data
Dedicated broker for AWX event ingestion. AWX webhooks land on the backend, which publishes them to three queues: awx.job.status, awx.workflow.status, awx.job.output. Three separate consumers drain those queues and update the database.
Why not Redis: AWX can burst hundreds of output chunks per second during a big playbook run. Keeping that traffic off Redis prevents it from stalling Celery task dispatch and WebSocket broadcasts.
Management UI: Bound to 127.0.0.1:15672 (not public). Log in with RABBITMQ_USER / RABBITMQ_PASSWORD for queue depth, consumer counts, and message rates.
Healthcheck: rabbitmq-diagnostics ping.
Application services
backend
Image: fabrik-backend:latest (built from backend/Dockerfile) · Memory: 512 MB
Django 5 running under Daphne ASGI. Handles every HTTP request and every WebSocket connection. The container entrypoint runs migrate → bootstrap_mim → daphne, so every restart:
- Applies pending Django migrations (idempotent).
- Seeds Neo4j with the MIM matching
APIC_VERSIONif the graph is empty. - Starts the ASGI server.
Exposed port: ${BACKEND_PORT:-8000} on the host. In production, put nginx in front and keep this bound to 127.0.0.1.
Health: Reachable on GET /api/health/ — returns {"status": "ok"} plus version info. Scrape-friendly.
celery-worker
Image: fabrik-backend:latest (same image as backend) · Memory: 1 GB
The workhorse. One container runs a single Celery process with configurable concurrency (CELERY_WORKER_CONCURRENCY, default 2). It subscribes to seven queues:
celery, query_exec, scheduled, awx_monitor, awx_exec, maintenance, mim_importTasks route to a queue based on @shared_task(queue=...) in the code — you don't configure routing in .env. Scale by running more worker containers rather than raising concurrency beyond 4 (Python GIL trade-offs).
celery-beat
Image: fabrik-backend:latest · Memory: 256 MB
Scheduler. One process, one replica — running two would double-fire every scheduled task. Reads the schedule from the database (django_celery_beat tables) and emits tasks onto Redis for workers to pick up.
Beat has no healthcheck beyond process liveness; watch its logs for Scheduler: Sending due task... lines when you expect a job to fire.
event-consumer-job / workflow / output
Three siblings, same image, different --queue argument. Each subscribes to one RabbitMQ queue and writes AWX updates into Postgres:
| Consumer | Queue | What it writes |
|---|---|---|
event-consumer-job | awx.job.status | Job lifecycle events (started, successful, failed) |
event-consumer-workflow | awx.workflow.status | Workflow-level events |
event-consumer-output | awx.job.output | Streaming stdout chunks — highest volume |
Each has its own healthcheck that opens a Pika connection to RabbitMQ every 30 s. A failed healthcheck usually means RabbitMQ is unreachable, not that the consumer itself is broken.
frontend
Image: fabrik-frontend:latest · Memory: 256 MB
A multi-stage build: stage 1 runs vite build, stage 2 is nginx:alpine serving the static SPA from /usr/share/nginx/html and reverse-proxying /api/, /admin/, /static/, and /ws/ to the backend container on port 8000. nginx listens on port 80 inside the container; map it to the host with FRONTEND_PORT (default 80).
Because all browser traffic terminates here, you usually don't expose the backend port externally. The frontend is the only public entry point; the backend is reachable only through the nginx proxy on the same container network.
Optional services
Started with --profile flags, off by default.
docs
Profile: docs · Image: node:22-alpine · Memory: 2 GB
The Fumadocs documentation site you're reading right now. Start with docker compose --profile docs up -d docs, reach on http://localhost:4000. Useful if you want the docs bundled inside the same deployment instead of the public site.
gitea
Profile: scm-gitea · Image: gitea/gitea:1.25.4
Lightweight Git server for AWX playbook source control. Only start this if you need AWX to pull playbooks from an SCM and don't already have one (GitLab, GitHub, etc.).
mailpit
Image: axllent/mailpit:latest · Memory: 128 MB
Local SMTP sink for testing. Catches outbound mail from Fabrik and shows it in a web UI at http://localhost:8025. Handy for development — remove or disable in production.
Reading the logs
Every service logs JSON lines to the Docker daemon with a 10 MB × 3 file rotation. Common patterns:
# Everything, followed
docker compose logs -f
# Just one service
docker compose logs -f backend
# Last 100 lines from workers and beat
docker compose logs --tail=100 celery-worker celery-beat
# Event consumers (all three)
docker compose logs -f event-consumer-job event-consumer-workflow event-consumer-outputScaling guidance
- More users / more queries running concurrently → scale
celery-worker(setdocker compose up --scale celery-worker=3) or raiseCELERY_WORKER_CONCURRENCY. - Lots of AWX output → RabbitMQ consumers are the bottleneck. Run them with higher concurrency or add replicas.
- Big MIM / many APIC versions → raise
NEO4J_HEAP_MAX_SIZEandNEO4J_PAGECACHE_SIZE, grow the container memory limit to match. - Heavy audit log traffic → Postgres. Consider a managed Postgres and point
DATABASE_URLat it.
Always docker stats first, scale second.