Digest and escalation
Batch noisy sources into hourly summaries with digest mode, and auto-route unread critical notifications to designated users with escalation rules.
Preferences let you silence what doesn't matter. Digest and escalation are the two features that handle the other end — what to do when there's too much signal, and what to do when nobody's looking.
Digest mode
Digest replaces immediate delivery with batched summaries. Instead of twelve individual notifications across an hour, you get one: "12 scheduled task events — 10 succeeded, 2 failed."
Turning it on
Two fields in notification preferences:
- digest_enabled — default off.
- digest_interval_minutes — default 60. How often the buffer is flushed.
With digest on, the standard create_notification() call doesn't store a notification — it writes to NotificationBuffer instead. A periodic task flushes the buffer into one summary notification per source.
How the flush works
The notifications.flush_notification_digests Celery task runs on a short cadence. For every user with digest enabled:
- Find buffered notifications older than
digest_interval_minutes. - Group them by source (
scheduled_task_success,awx_execution_failure, etc.). - For each source group:
- Count successes vs. errors/warnings.
- Pick the worst severity in the group as the summary type (error > warning > success).
- Create a single summary notification: "12 scheduled task events — 10 succeeded, 2 failed" with
metadata.digest=true.
- If email is enabled and the user's severity threshold allows it, email the summary.
- Delete the flushed buffer entries.
The flushed summary is a normal notification from that point — it shows up in the bell, the center, and email. Its digest=true flag lets the UI render it distinctly (count badge, expandable breakdown).
Worst-severity wins
A batch of ten successes and two failures summarizes as error severity — because errors are the thing you'd care about. The UI still shows the success count; you're just being alerted at the severity level that matters.
This is deliberate: digest mode should never demote your attention. Batching is a convenience, not a reason to miss failures.
What digest does well
- High-frequency, mostly-successful sources. Hourly health checks across 20 APICs. 479 successes and 1 failure per day → one hourly ping instead of 480.
- Noisy sources during migrations. A bulk AWX run spins off dozens of execution notifications. Digest collapses them into a single "25 jobs, 23 succeeded, 2 failed" summary.
- Email hygiene. Without digest, every event is a separate email thread. With digest, one email per hour per source.
What digest does badly
- Time-critical failures. A single AWX failure at 14:05 won't surface until the next digest flush. If the source is "I need to know within minutes," keep digest off for it.
- Interactive workflows. An approval request buffered for 50 minutes is useless — the requester is waiting.
You can't currently opt specific sources into digest while leaving others immediate — digest is a user-level toggle, not per-source. A workable pattern is to turn digest off and rely on high email-severity thresholds instead.
Quiet hours and digest serve different purposes. Quiet hours drop notifications during a window. Digest defers them and eventually delivers a summary. If you want overnight silence with morning summaries, quiet hours alone won't give you the summary — you'd leave quiet hours off and let digest collapse the overnight volume.
Escalation
Escalation auto-routes unread critical notifications to designated users after a configurable window. The intent: catastrophic notifications can't be silently ignored because the primary recipient went home.
The model
An EscalationRule is defined by an admin and lives in the database:
| Field | Purpose |
|---|---|
| Name | Human-readable, shown in admin UI |
| Source | Optional source filter (e.g. only awx_execution_failure). Empty = match all sources. |
| Min severity | Only escalate at or above this level. Default error. |
| Escalate after minutes | Wait this long after the original fires before escalating. Default 30. |
| Escalate to | M2M set of user recipients. |
| Email on escalation | Whether to ping the recipients via email too. |
| Is active | Kill switch; deactivated rules never fire. |
How escalation actually fires
The notifications.check_escalations Celery task runs every 5 minutes. For each active rule:
- Find notifications with:
is_read=falsecreated_at <= now - escalate_after_minutes- Severity >=
min_severity - Not already escalated (
metadata.escalatednot set) - Source matches the rule (if the rule specifies one)
- For each match, emit a new notification to every target in
escalate_towith:- Title:
[ESCALATED] {original title} - Message: includes original user, original message truncated, escalation age
- Source:
system_maintenance - Metadata:
{escalated_from: <original id>, original_user: <username>}
- Title:
- Mark the original notification's metadata
escalated=trueso it won't re-escalate.
The escalated notification is a distinct row — the original stays put, and the escalation copy lives in each recipient's inbox with its own read/unread state. Recipients can acknowledge the escalation independently of the original.
Why the original keeps its unread state
Two reasons:
- Escalation doesn't mean the original user is off the hook. They may come back online and deal with it. The unread state is a personal to-do signal.
- Marking it read would hide it from the original user's dashboard — they'd never know an incident fired against them while they were away.
If the original user does read and resolve the notification, that doesn't unwind the escalation (the copy is already out there). The two notifications are independent once the escalation fires.
Source-specific escalations
Many deployments want different escalation behavior per source. You can create multiple rules:
- APIC failure escalation — source=
scheduled_task_failure, after 15 min, to the network on-call team. - AWX critical failure escalation — source=
awx_execution_failure, min_severity=error, after 30 min, to the automation owners. - Catch-all error escalation — source empty, min_severity=
error, after 60 min, to the platform admins.
Rules are evaluated independently; a single notification can be escalated by multiple rules if all of them match.
The [ESCALATED] convention
The title prefix is literal: [ESCALATED] Original title here. Recipients see it prominently in the bell dropdown — a visual indicator that this isn't a new event, it's an unread-too-long event. The metadata carries escalated_from so the UI could deep-link back to the original (current UI shows the text reference; direct click-through is a roadmap item).
Email on escalation
When email_on_escalation=true and the recipient has email enabled and passes the severity gate, the escalation copy emails. Most escalations want this — the whole point of escalation is "the in-app bell didn't work, try harder."
Admins usually create escalation rules with email_on_escalation=true. The recipients' own email-severity threshold still applies, so if a recipient has set email_min_severity=error they only get paged for error-class escalations.
Interactions with other features
Escalation + digest
If an escalation target has digest mode enabled, the escalation is buffered like any other notification. This is almost certainly wrong — you want escalations immediate. Admins typically configure escalation-target users to disable digest for their accounts, or at least set a short digest_interval_minutes (5 minutes) so escalations don't languish.
Escalation + quiet hours
Escalations in a recipient's quiet hours are dropped for that user (quiet hours apply to all notifications, no exceptions). If critical escalations must pierce quiet hours, route to multiple users — someone's quiet hours won't overlap someone else's on-call window.
Escalation + suppression
A notification that got filtered out entirely (per-source opt-out, in_app_enabled=false) was never persisted in the first place, so it can't be escalated from. The escalation starts from the persisted Notification row — upstream suppression means no escalation.
Operational patterns
The classic night-shift pattern
Rule 1 — all errors, escalate after 20 minutes, to the secondary on-call. Email on.
Rule 2 — AWX failures specifically, escalate after 10 minutes, to a senior engineer list.
Primary on-call gets the original. If they acknowledge within 20 min → no escalation. If they don't → secondary gets paged. If the AWX path is failing → senior team gets paged faster.
The compliance pattern
Rule — system_maintenance source, severity error, escalate after 0 minutes, to the compliance admin list, email on.
Any system-level error fires an immediate duplicate to the compliance team. Zero-minute escalation is technically valid — the task runs every 5 min so the practical minimum is 5 min. Use when the compliance team needs visibility into every system-level incident.
The quiet escalation pattern
Rule — source empty, severity warning, escalate after 240 minutes (4 hours), to the user's manager, email off.
Unread warning-class notifications drift up to managers silently. No email noise — just an in-app heads-up that someone on the team is behind on their notifications. Used occasionally in regulated environments.
Troubleshooting
Digest and escalation issues that come up often:
- "Digest notifications never arrive." The flush task runs on a Celery Beat schedule. If Celery Beat is down, digests don't flush. Check
docker compose logs celery-beat. - "My digest is empty but I had events." The buffer is cleared after each flush. An event that arrived in the last flush window already went out; an event in the current window hasn't matured yet (
digest_interval_minuteshasn't passed for it). - "Escalation fired for a notification I already read." Escalation check runs every 5 minutes; if you read the notification between the fire time and the next check, the escalation still goes out. The race window is at most 5 minutes.
- "Escalated notifications keep firing." Check the
metadata.escalatedflag on the original — it should be set after first escalation. If a rule edit reset it, the notification is eligible again. - "I want to escalate to a Slack channel, not a user." Current escalations target users only. Route to a dedicated "slack-bridge" user whose email address forwards to Slack via SMTP-to-Slack. First-class webhook targets are a roadmap item.
- "A rule matches too many notifications." Narrow by
sourceand raisemin_severity. Source-empty + severityinfocatches everything including success pings, which is rarely what you want. - "Escalation fires immediately for old notifications after I create a rule." Expected — the rule applies to any existing unread notification older than
escalate_after_minutes. If you create a rule withescalate_after_minutes=30and there are unread notifications from two hours ago, they escalate on the next check. Consider acknowledging old notifications before enabling aggressive rules.
That covers notifications — in-app, email, digest, escalation. The next major section — Administration — is about the admin-only view: user management, groups, permissions, audit logs, and the system-wide settings that keep everything in line.
Preferences
Per-source opt-outs, in-app and email channels, severity thresholds, quiet hours — everything you can tune from your own user profile.
Administration
The admin-only surface area of Fabrik — user management, group-based permissions and quotas, LDAP integration, and the audit trail that keeps everything accountable.