FabrikFabrik
FabrikDeployment

Services

What each of the nine Fabrik services does, how they depend on each other, and how to tell when one is unhealthy.

The Fabrik stack is nine long-running containers. They share one Docker image where possible, one Docker network, and one .env file. This page walks each service in order of dependency — the ones lower in the chain have to be up before the ones above start cleanly.

Dependency graph

              ┌──────────────────────┐
              │  postgres  (required)│
              │  neo4j     (required)│
              │  redis     (required)│
              │  rabbitmq  (required)│
              └──────────┬───────────┘
                         │ healthcheck gates
              ┌──────────┴──────────────────────────────┐
              │ backend · celery-worker · celery-beat    │
              │ event-consumer-job/workflow/output       │
              └──────────┬──────────────────────────────┘

                    ┌────┴────┐
                    │ frontend│
                    └─────────┘

Compose uses depends_on: condition: service_healthy, which means the Python services won't even try to start until Postgres reports pg_isready, Redis replies to PING, RabbitMQ passes rabbitmq-diagnostics ping, and Neo4j responds on port 7474. If one data service is slow to come up, the stack waits — it doesn't fail.

Data services

postgres

Image: postgres:17-alpine · Memory: 512 MB · Volume: fabrik_postgres_data

Primary relational store for everything non-graph: users, groups, saved queries, scheduled tasks, AWX templates, audit logs, time machine snapshot metadata. Exposed on 127.0.0.1:5432 so the host can inspect it — not the public network.

Healthcheck: pg_isready -U fabrik every 10 s. If this ever reports unhealthy, every Python service will follow within a minute.

Backup: pg_dump against the container (see Upgrading and backup).

neo4j

Image: neo4j:5.26 · Memory: 2 GB · Volumes: fabrik_neo4j_data, fabrik_neo4j_logs

Graph database storing the ACI MIM — class hierarchy, containment rules, property definitions. Populated by the backend on first boot from the MIM registry matching APIC_VERSION, or by explicit MIM imports triggered from the admin UI.

Heap and page cache: NEO4J_HEAP_MAX_SIZE (default 1 GB) and NEO4J_PAGECACHE_SIZE (default 256 MB) together dictate Neo4j's RSS. The 2 GB container limit has about 750 MB of headroom for other JVM needs — don't tune these past that.

Healthcheck: HTTP check against http://localhost:7474 with a 30 s startup grace period. Neo4j takes the longest to warm up; that's normal.

redis

Image: redis:8-alpine · Memory: 256 MB · Volume: fabrik_redis_data

Three overlapping roles:

  1. Celery broker (/0) — task queue for backend → worker dispatch.
  2. Celery result backend (/1) — short-lived result storage.
  3. Django Channels layer — WebSocket group membership and message routing.
  4. MIM cache — short-TTL responses from Neo4j (see the cache tiers in backend/mim/cache.py).

No authentication by default — Redis is only reachable from other containers on fabrik-network.

Healthcheck: redis-cli ping.

rabbitmq

Image: rabbitmq:4.1-management-alpine · Memory: 1 GB · Volume: fabrik_rabbitmq_data

Dedicated broker for AWX event ingestion. AWX webhooks land on the backend, which publishes them to three queues: awx.job.status, awx.workflow.status, awx.job.output. Three separate consumers drain those queues and update the database.

Why not Redis: AWX can burst hundreds of output chunks per second during a big playbook run. Keeping that traffic off Redis prevents it from stalling Celery task dispatch and WebSocket broadcasts.

Management UI: Bound to 127.0.0.1:15672 (not public). Log in with RABBITMQ_USER / RABBITMQ_PASSWORD for queue depth, consumer counts, and message rates.

Healthcheck: rabbitmq-diagnostics ping.

Application services

backend

Image: fabrik-backend:latest (built from backend/Dockerfile) · Memory: 512 MB

Django 5 running under Daphne ASGI. Handles every HTTP request and every WebSocket connection. The container entrypoint runs migratebootstrap_mimdaphne, so every restart:

  1. Applies pending Django migrations (idempotent).
  2. Seeds Neo4j with the MIM matching APIC_VERSION if the graph is empty.
  3. Starts the ASGI server.

Exposed port: ${BACKEND_PORT:-8000} on the host. In production, put nginx in front and keep this bound to 127.0.0.1.

Health: Reachable on GET /api/health/ — returns {"status": "ok"} plus version info. Scrape-friendly.

celery-worker

Image: fabrik-backend:latest (same image as backend) · Memory: 1 GB

The workhorse. One container runs a single Celery process with configurable concurrency (CELERY_WORKER_CONCURRENCY, default 2). It subscribes to seven queues:

celery, query_exec, scheduled, awx_monitor, awx_exec, maintenance, mim_import

Tasks route to a queue based on @shared_task(queue=...) in the code — you don't configure routing in .env. Scale by running more worker containers rather than raising concurrency beyond 4 (Python GIL trade-offs).

celery-beat

Image: fabrik-backend:latest · Memory: 256 MB

Scheduler. One process, one replica — running two would double-fire every scheduled task. Reads the schedule from the database (django_celery_beat tables) and emits tasks onto Redis for workers to pick up.

Beat has no healthcheck beyond process liveness; watch its logs for Scheduler: Sending due task... lines when you expect a job to fire.

event-consumer-job / workflow / output

Three siblings, same image, different --queue argument. Each subscribes to one RabbitMQ queue and writes AWX updates into Postgres:

ConsumerQueueWhat it writes
event-consumer-jobawx.job.statusJob lifecycle events (started, successful, failed)
event-consumer-workflowawx.workflow.statusWorkflow-level events
event-consumer-outputawx.job.outputStreaming stdout chunks — highest volume

Each has its own healthcheck that opens a Pika connection to RabbitMQ every 30 s. A failed healthcheck usually means RabbitMQ is unreachable, not that the consumer itself is broken.

frontend

Image: fabrik-frontend:latest · Memory: 256 MB

A multi-stage build: stage 1 runs vite build, stage 2 is nginx:alpine serving the static SPA from /usr/share/nginx/html and reverse-proxying /api/, /admin/, /static/, and /ws/ to the backend container on port 8000. nginx listens on port 80 inside the container; map it to the host with FRONTEND_PORT (default 80).

Because all browser traffic terminates here, you usually don't expose the backend port externally. The frontend is the only public entry point; the backend is reachable only through the nginx proxy on the same container network.

Optional services

Started with --profile flags, off by default.

docs

Profile: docs · Image: node:22-alpine · Memory: 2 GB

The Fumadocs documentation site you're reading right now. Start with docker compose --profile docs up -d docs, reach on http://localhost:4000. Useful if you want the docs bundled inside the same deployment instead of the public site.

gitea

Profile: scm-gitea · Image: gitea/gitea:1.25.4

Lightweight Git server for AWX playbook source control. Only start this if you need AWX to pull playbooks from an SCM and don't already have one (GitLab, GitHub, etc.).

mailpit

Image: axllent/mailpit:latest · Memory: 128 MB

Local SMTP sink for testing. Catches outbound mail from Fabrik and shows it in a web UI at http://localhost:8025. Handy for development — remove or disable in production.

Reading the logs

Every service logs JSON lines to the Docker daemon with a 10 MB × 3 file rotation. Common patterns:

# Everything, followed
docker compose logs -f

# Just one service
docker compose logs -f backend

# Last 100 lines from workers and beat
docker compose logs --tail=100 celery-worker celery-beat

# Event consumers (all three)
docker compose logs -f event-consumer-job event-consumer-workflow event-consumer-output

Scaling guidance

  • More users / more queries running concurrently → scale celery-worker (set docker compose up --scale celery-worker=3) or raise CELERY_WORKER_CONCURRENCY.
  • Lots of AWX output → RabbitMQ consumers are the bottleneck. Run them with higher concurrency or add replicas.
  • Big MIM / many APIC versions → raise NEO4J_HEAP_MAX_SIZE and NEO4J_PAGECACHE_SIZE, grow the container memory limit to match.
  • Heavy audit log traffic → Postgres. Consider a managed Postgres and point DATABASE_URL at it.

Always docker stats first, scale second.