Services

What each of the nine Fabrik services does, how they depend on each other, and how to tell when one is unhealthy.

The Fabrik stack is nine long-running containers. They share one Docker image where possible, one Docker network, and one .env file. This page walks each service in order of dependency — the ones lower in the chain have to be up before the ones above start cleanly.

Dependency graph

              ┌──────────────────────┐
              │  postgres  (required)│
              │  neo4j     (required)│
              │  redis     (required)│
              │  rabbitmq  (required)│
              └──────────┬───────────┘
                         │ healthcheck gates
              ┌──────────┴──────────────────────────────┐
              │ backend · celery-worker · celery-beat    │
              │ event-consumer-job/workflow/output       │
              └──────────┬──────────────────────────────┘
                         │
                    ┌────┴────┐
                    │ frontend│
                    └─────────┘

Compose uses depends_on: condition: service_healthy, which means the Python services won't even try to start until Postgres reports pg_isready, Redis replies to PING, RabbitMQ passes rabbitmq-diagnostics ping, and Neo4j responds on port 7474. If one data service is slow to come up, the stack waits — it doesn't fail.

Data services

postgres

Image: postgres:17-alpine · Memory: 512 MB · Volume: fabrik_postgres_data

Primary relational store for everything non-graph: users, groups, saved queries, scheduled tasks, AWX templates, audit logs, time machine snapshot metadata. Exposed on 127.0.0.1:5432 so the host can inspect it — not the public network.

Healthcheck: pg_isready -U fabrik every 10 s. If this ever reports unhealthy, every Python service will follow within a minute.

Backup: pg_dump against the container (see Upgrading and backup).

neo4j

Image: neo4j:5.26 · Memory: 2 GB · Volumes: fabrik_neo4j_data, fabrik_neo4j_logs

Graph database storing the ACI MIM — class hierarchy, containment rules, property definitions. Populated by the backend on first boot from the MIM registry matching APIC_VERSION, or by explicit MIM imports triggered from the admin UI.

Heap and page cache: NEO4J_HEAP_MAX_SIZE (default 1 GB) and NEO4J_PAGECACHE_SIZE (default 256 MB) together dictate Neo4j's RSS. The 2 GB container limit has about 750 MB of headroom for other JVM needs — don't tune these past that.

Healthcheck: HTTP check against http://localhost:7474 with a 30 s startup grace period. Neo4j takes the longest to warm up; that's normal.

redis

Image: redis:8-alpine · Memory: 256 MB · Volume: fabrik_redis_data

Three overlapping roles:

Celery broker (/0) — task queue for backend → worker dispatch.
Celery result backend (/1) — short-lived result storage.
Django Channels layer — WebSocket group membership and message routing.
MIM cache — short-TTL responses from Neo4j (see the cache tiers in backend/mim/cache.py).

No authentication by default — Redis is only reachable from other containers on fabrik-network.

Healthcheck: redis-cli ping.

rabbitmq

Image: rabbitmq:4.1-management-alpine · Memory: 1 GB · Volume: fabrik_rabbitmq_data

Dedicated broker for AWX event ingestion. AWX webhooks land on the backend, which publishes them to three queues: awx.job.status, awx.workflow.status, awx.job.output. Three separate consumers drain those queues and update the database.

Why not Redis: AWX can burst hundreds of output chunks per second during a big playbook run. Keeping that traffic off Redis prevents it from stalling Celery task dispatch and WebSocket broadcasts.

Management UI: Bound to 127.0.0.1:15672 (not public). Log in with RABBITMQ_USER / RABBITMQ_PASSWORD for queue depth, consumer counts, and message rates.

Healthcheck: rabbitmq-diagnostics ping.

Application services

backend

Image: fabrik-backend:latest (built from backend/Dockerfile) · Memory: 512 MB

Django 5 running under Daphne ASGI. Handles every HTTP request and every WebSocket connection. The container entrypoint runs migrate → bootstrap_mim → daphne, so every restart:

Applies pending Django migrations (idempotent).
Seeds Neo4j with the MIM matching APIC_VERSION if the graph is empty.
Starts the ASGI server.

Exposed port: ${BACKEND_PORT:-8000} on the host. In production, put nginx in front and keep this bound to 127.0.0.1.

Health: Reachable on GET /api/health/ — returns {"status": "ok"} plus version info. Scrape-friendly.

celery-worker

Image: fabrik-backend:latest (same image as backend) · Memory: 1 GB

The workhorse. One container runs a single Celery process with configurable concurrency (CELERY_WORKER_CONCURRENCY, default 2). It subscribes to seven queues:

celery, query_exec, scheduled, awx_monitor, awx_exec, maintenance, mim_import

Tasks route to a queue based on @shared_task(queue=...) in the code — you don't configure routing in .env. Scale by running more worker containers rather than raising concurrency beyond 4 (Python GIL trade-offs).

celery-beat

Image: fabrik-backend:latest · Memory: 256 MB

Scheduler. One process, one replica — running two would double-fire every scheduled task. Reads the schedule from the database (django_celery_beat tables) and emits tasks onto Redis for workers to pick up.

Beat has no healthcheck beyond process liveness; watch its logs for Scheduler: Sending due task... lines when you expect a job to fire.

event-consumer-job / workflow / output

Three siblings, same image, different --queue argument. Each subscribes to one RabbitMQ queue and writes AWX updates into Postgres:

Consumer	Queue	What it writes
`event-consumer-job`	`awx.job.status`	Job lifecycle events (started, successful, failed)
`event-consumer-workflow`	`awx.workflow.status`	Workflow-level events
`event-consumer-output`	`awx.job.output`	Streaming stdout chunks — highest volume

Each has its own healthcheck that opens a Pika connection to RabbitMQ every 30 s. A failed healthcheck usually means RabbitMQ is unreachable, not that the consumer itself is broken.

frontend

Image: fabrik-frontend:latest · Memory: 256 MB

A multi-stage build: stage 1 runs vite build, stage 2 is nginx:alpine serving the static SPA from /usr/share/nginx/html and reverse-proxying /api/, /admin/, /static/, and /ws/ to the backend container on port 8000. nginx listens on port 80 inside the container; map it to the host with FRONTEND_PORT (default 80).

Because all browser traffic terminates here, you usually don't expose the backend port externally. The frontend is the only public entry point; the backend is reachable only through the nginx proxy on the same container network.

# Everything, followed
docker compose logs -f

# Just one service
docker compose logs -f backend

# Last 100 lines from workers and beat
docker compose logs --tail=100 celery-worker celery-beat

# Event consumers (all three)
docker compose logs -f event-consumer-job event-consumer-workflow event-consumer-output

Scaling guidance

More users / more queries running concurrently → scale celery-worker (set docker compose up --scale celery-worker=3) or raise CELERY_WORKER_CONCURRENCY.
Lots of AWX output → RabbitMQ consumers are the bottleneck. Run them with higher concurrency or add replicas.
Big MIM / many APIC versions → raise NEO4J_HEAP_MAX_SIZE and NEO4J_PAGECACHE_SIZE, grow the container memory limit to match.
Heavy audit log traffic → Postgres. Consider a managed Postgres and point DATABASE_URL at it.

Always docker stats first, scale second.

Dependency graph

Data services

postgres

neo4j

redis

rabbitmq

Application services

backend

celery-worker

celery-beat

event-consumer-job / workflow / output

frontend

Optional services

docs

gitea

mailpit

Reading the logs

Scaling guidance

On this page