FabrikFabrik
FabrikDeployment

Upgrading and backup

Backup Postgres, Neo4j, and the MIM cache. Upgrade Fabrik safely with rollback. Rotate secrets without losing credentials.

Operating Fabrik long-term means two recurring jobs: taking backups you can actually restore, and upgrading without losing data. This page is the operational runbook for both.

What needs backing up

DataLives inChangesRecovery cost if lost
PostgreSQL databasefabrik_postgres_data volumeConstantly — every user actionCatastrophic — users, queries, schedules, audit trail gone
Neo4j graphfabrik_neo4j_data volumeRarely — only on MIM importsRebuildable from the MIM registry
MIM cache filesfabrik_mim_cache volumeOn MIM importRebuildable from the MIM registry
.env fileRepository working treeOnly when you edit itCritical — ENCRYPTION_KEY is irreplaceable
nginx certsnginx/ssl/On cert rotationReissueable
RabbitMQ volumefabrik_rabbitmq_dataTransient — queues drainNone — queues rebuild from AWX webhooks
Redis volumefabrik_redis_dataTransient — cache onlyNone — cache rebuilds on demand

In practice, backups focus on Postgres and .env. Everything else is either rebuildable or ephemeral.

Losing ENCRYPTION_KEY is unrecoverable. Every stored APIC password, AWX token, and TOTP secret is Fernet-encrypted with it. Restoring a Postgres dump against a different ENCRYPTION_KEY leaves you with ciphertext no process can decrypt. Back up .env with the same care as Postgres.

PostgreSQL backup

Daily dump

Run from the host, scheduled in cron:

docker compose exec -T postgres pg_dump \
  -U fabrik \
  -d fabrik \
  --clean --if-exists \
  | gzip > /backup/fabrik-$(date +%F).sql.gz

-T disables TTY allocation so cron doesn't hang. --clean --if-exists makes the dump idempotent — restoring it drops and recreates objects cleanly.

Retain 14 daily, 8 weekly, 12 monthly copies off-host. Exact retention is a compliance question; the schedule is a defense-in-depth one.

Restore

Stop the stack first so nothing writes during restore:

docker compose stop backend celery-worker celery-beat \
  event-consumer-job event-consumer-workflow event-consumer-output

gunzip -c /backup/fabrik-2026-04-22.sql.gz \
  | docker compose exec -T postgres psql -U fabrik -d fabrik

docker compose start backend celery-worker celery-beat \
  event-consumer-job event-consumer-workflow event-consumer-output

After restore, verify: log in, open Scheduled Tasks, confirm recent queries are present.

Neo4j backup

Neo4j isn't critical — the MIM graph reimports from the registry on next start if empty. But a backup saves import time and lets you pin a known-good MIM version:

# Online backup via cypher-shell APOC export (requires APOC plugin)
docker compose exec neo4j \
  cypher-shell -u neo4j -p "$NEO4J_PASSWORD" \
  "CALL apoc.export.cypher.all('/data/neo4j-backup.cypher', {})"

docker cp fabrik-neo4j:/data/neo4j-backup.cypher /backup/

Or simpler: stop the container, tar the volume, start it again. Slower but bulletproof.

.env backup

# Copy to a secrets vault, not just another directory on the same host
cp /opt/fabrik/.env /secure-backup/fabrik.env.$(date +%F)

Store this somewhere you'd store AWS credentials — 1Password, Vault, a sealed secret in your management plane. Anyone with .env can decrypt every credential in the Postgres dump.

Upgrade procedure

Fabrik upgrades are pull-and-restart. The entrypoint runs pending migrations automatically. The procedure:

Take a Postgres backup. See above. Don't skip — upgrades run migrations that can be hard to reverse.

Back up .env. You'll need it if you have to roll back, and you shouldn't be editing it in place anyway.

Pull the new code.

cd /opt/fabrik
git fetch --tags
git checkout v1.2.0   # or whatever the target version is

Diff .env.example against your .env.

diff <(grep -oE '^[A-Z_]+' .env.example | sort -u) \
     <(grep -oE '^[A-Z_]+' .env | sort -u)

Add any new keys the release introduced. Release notes call these out.

Rebuild and restart.

docker compose -f docker-compose.yml -f docker-compose.prod.yml \
  up -d --build

Compose rebuilds only what changed. Downtime is usually under a minute — backend restarts, migrations apply, Celery reconnects.

Verify. Check docker compose ps — every service should be healthy. Hit /api/health/ and log in. Run a saved query. Check scheduled tasks list.

Rollback

If the upgrade breaks something you can't live with:

# Stop the new stack
docker compose down

# Restore Postgres from the pre-upgrade backup
gunzip -c /backup/fabrik-pre-upgrade.sql.gz \
  | docker compose exec -T postgres psql -U fabrik -d fabrik

# Restore the old code
git checkout <previous-tag>

# Start the old stack
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

If the new version applied migrations that the old code doesn't understand, restoring Postgres to the pre-upgrade state is the only safe path — don't try to downgrade Django against a migrated database.

Rotating secrets

DJANGO_SECRET_KEY

Safe to rotate. Invalidates all active sessions and JWTs — users need to log in again. No data loss.

# Generate new
python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"

# Update .env, restart backend + workers + beat + consumers
docker compose restart backend celery-worker celery-beat \
  event-consumer-job event-consumer-workflow event-consumer-output

ENCRYPTION_KEY

Not safe to rotate in place. Every encrypted credential in Postgres was encrypted with the old key. To rotate:

  1. Go to Settings → Integrations and note every APIC connection, AWX connection, and AI provider key.
  2. Delete them from Fabrik (the credentials, not the users/groups).
  3. Take a fresh Postgres backup (sanity).
  4. Update ENCRYPTION_KEY in .env.
  5. Restart backend, workers, beat, and consumers.
  6. Re-enter every credential you noted in step 1.

There is no tooling for in-place key rotation. Rotate only when you have reason to (suspected leak), not on a schedule.

Database passwords

Postgres, Neo4j, and RabbitMQ passwords can be rotated with a slightly longer dance: stop dependent services, change the password inside the database container, update .env, restart everything. Test against a staging environment first — Neo4j in particular can be stubborn if the password doesn't match its stored state.

Volume migration

Named volumes (e.g. fabrik_postgres_data) are host-local. Moving Fabrik to a new host means moving the volumes with it:

# On the old host
docker run --rm -v fabrik_postgres_data:/data -v $(pwd):/backup alpine \
  tar czf /backup/postgres_data.tar.gz -C /data .

# On the new host, after docker compose up has created the volume
docker run --rm -v fabrik_postgres_data:/data -v /transfer:/backup alpine \
  tar xzf /backup/postgres_data.tar.gz -C /data

Stop the stack on both ends during the copy. Repeat for fabrik_neo4j_data and fabrik_rabbitmq_data. Don't bother with Redis — rebuilds itself.

What to monitor

Minimum monitoring for a production install:

  • Disk usage on the host — Postgres and audit logs grow.
  • /api/health/ returning 200 at one-minute intervals.
  • Container restart count. If something is flapping, the restart count climbs. docker events streams them.
  • Celery queue depth. Watch the RabbitMQ management UI (for AWX events) and Redis LLEN celery (for tasks). Persistent backlog means add workers.
  • Scheduled task success rate. Settings → Scheduled Tasks shows the failure streak per task — surface it in your own dashboards if you care.

Fabrik doesn't ship a Prometheus exporter. If you want one, that's a reasonable contribution — Django has django-prometheus and Celery has celery-exporter.