FabrikFabrik
FabrikTime Machine

Capturing snapshots

How to take a snapshot — manually from the results panel, automatically via a scheduled task, or programmatically — and what happens when a capture is rejected.

A snapshot is a frozen copy of a query's output at a specific moment. This page walks through the three ways to create one, how deduplication decides whether it actually gets stored, and what to do when Fabrik refuses.

The three capture paths

PathWhen it firesTypical use
ManualUser clicks Save Snapshot on the query results panelBefore/after pairs around a change window
ScheduledA scheduled task executes a saved query and captures the resultDaily drift detection, compliance timelines
Ad-hocAn unsaved query's result is explicitly capturedExploration, debugging one-off issues

All three land in the same QueryExecutionSnapshot table and go through the same capture pipeline — the only difference is how the trigger happens.

Manual capture

Open a saved query, hit Run, and when results come back you'll see a Save Snapshot button in the results toolbar (next to Export and Copy JSON).

Clicking it:

  1. Sends the raw result to the backend.
  2. Backend hashes it, checks for duplicates, enforces size limits.
  3. If everything passes, a snapshot is written and the button shows a confirmation.

Annotation dialog. After save, you can attach a note and a label to the snapshot — "Pre-migration baseline," "After BGP change," "CR-4821 before". These stay with the snapshot forever and show up in the snapshot list, making it easy to find the one that mattered.

Manual captures are the right tool when:

  • You're about to make a change and want a known-good baseline.
  • Something looks wrong and you want to freeze the current state before digging in.
  • A one-off audit needs a specific point-in-time record.

Scheduled capture

This is the heavy-lifting path for drift detection. A scheduled task runs a saved query on a cadence (hourly, daily, weekly, etc.) and can capture the result on every fire.

Enable on a task → open the task, toggle Capture snapshots on the task form. Every subsequent execution adds a snapshot for each APIC connection configured on the task.

Typical setups:

  • Daily fault count, 06:00 server time, all APICs → 365 snapshots/year, dedup collapses streaks of stable state to one row.
  • Hourly endpoint inventory, single APIC → 24 snapshots/day; drift on the endpoint table shows as has_changes=true.
  • Weekly full tenant dump, Sunday 02:00 → slow-moving compliance record.

Each scheduled capture records execution_type='scheduled' plus the scheduled_task_id and scheduled_task_execution_id. These fields make it trivial to trace "this snapshot came from that task run."

Scheduled captures are fire-and-forget. If the snapshot fails (size limit, APIC error response, storage exhaustion), the scheduled task itself still reports success — the capture is a side effect, not a prerequisite. Check the snapshot logs separately.

Ad-hoc capture (unsaved queries)

An unsaved query — something you built on the canvas but haven't hit Save on — can still be snapshotted. The snapshot stores the query_structure (the full flow JSON) so the query is reconstructable later.

What's different:

  • saved_query_id is null.
  • class_name carries the ACI class as the fallback identity.
  • The snapshot does not appear in the main Time Machine query list (no stable identity to group under).
  • It's still findable via direct link, snapshot detail views, and the admin interface.

Ad-hoc captures are more of a safety net than a primary workflow. If you care about the snapshot long-term, save the query first — it's cheaper to carry a saved query than to track down rogue ad-hoc snapshots months later.

What the capture pipeline actually does

Every capture path converges on TimeMachineService.capture_snapshot(). Its sequence:

  1. Load settings. User-specific or global (TimeMachineSettings.get_for_user).
  2. Serialize to JSON and compute byte size.
  3. Reject APIC error responses. Any result with a messages array containing error or warning severity is dropped — saving it would poison the next diff.
  4. Enforce size limit. Bigger than max_snapshot_size_mb? Refused with snapshot_too_large.
  5. Compute SHA-256 hash.
  6. Dedup check. If store_duplicates is off and the hash matches the previous snapshot for the same (saved_query, APIC), skip.
  7. Set has_changes. True if the previous snapshot's hash differs; false otherwise (or for the first snapshot).
  8. Persist with all the query metadata — version hash, major/minor, class name, connection name (denormalized so history survives a connection delete).
  9. Return result.

The service returns a structured response — {success, skipped?, snapshot_id?, is_duplicate?, has_changes?, error?} — so callers know exactly whether a row was written, a duplicate was suppressed, or an error blocked the save.

Why a capture gets rejected

Three real rejection reasons:

APIC error response

{
  "success": false,
  "error": "apic_error_response",
  "reason": "APIC returned an error or warning — snapshot not saved to prevent false drift alarms."
}

The query hit APIC but the response came back with a messages array flagging error or warning. Typical triggers: expired token, permission-denied, server timeout, query too large. Fix the underlying APIC call and retry — don't force-save the bad result.

Snapshot too large

{
  "success": false,
  "error": "snapshot_too_large",
  "size_mb": 23.4,
  "limit_mb": 10
}

The serialized result exceeds the per-user max_snapshot_size_mb. Two options:

  • Narrow the query. Use filters and post-processors to drop attributes you don't need. Smaller result → smaller snapshot.
  • Raise the limit. Edit the user's Time Machine settings. Reasonable ceilings: 10 MB default, 50 MB for big-tenant queries, 100 MB for compliance dumps you genuinely need verbatim.

Raising the limit is usually wrong if the query can be narrowed — snapshots are jsonb blobs; storing tens of MB each adds up fast.

Duplicate (not really a rejection)

{
  "success": true,
  "skipped": true,
  "reason": "duplicate",
  "previous_snapshot_id": "<uuid>"
}

success: true but skipped: true means the hash matched the previous snapshot and store_duplicates is off. The previous snapshot ID is returned so the caller can reference it — nothing new was written, and nothing is broken.

To force every run to persist (even identical ones), turn Store duplicates on in settings. Almost nobody wants this; the default behavior is healthier.

Annotations and labels

Two small fields make a snapshot dramatically easier to find later:

  • Annotation — free-form note, up to a few paragraphs. "Pre-maintenance snapshot for CR-4821. BGP changes starting 02:00."
  • Label — short tag, up to 100 chars. baseline, pre-change, post-change, incident-20260422.

Both are editable after the fact from the snapshot detail view. Labels work particularly well as a poor-man's category — filter the snapshot list by label to pull up everything tagged baseline across all queries.

Every snapshot stores:

  • query_version_hash — short SHA of the query structure.
  • major_version and minor_version — the semantic version at capture time.

The UI uses these to warn on cross-version comparisons. The hash is indexed, so grouping "all snapshots of this query while it was at version 2.3" is a single index lookup.

If the query is later deleted, the snapshots stay (the FK is on_delete=CASCADE). Wait — that's wrong, let me re-check.

Deleting a saved query deletes its Time Machine history. The foreign key is on_delete=CASCADE — when the query goes, the snapshots go with it. If you need to retain history past the query's lifetime, export the snapshots first (future roadmap; currently manual via the detail view).

Retention isn't applied at capture time

On purpose. Retention cleanup runs on a schedule (daily at 03:30), not inline with capture. The reason is performance: cleanup is a set-based SQL delete that can touch tens of thousands of rows. Running it inside the capture path would stall APIC workers for seconds at a time under load.

Net effect: you can capture more aggressively than your retention policy implies, and the daily cleanup will harmonize things overnight. See Retention and settings for the full behavior.

Troubleshooting

Capture issues that come up often:

  • "Save Snapshot button does nothing." Check the browser console — it's usually a 413 / 502 from the backend because the payload is large. Check backend logs for snapshot_too_large or apic_error_response.
  • "My scheduled task fires but no snapshots appear." The task may not have capture enabled, or every run is being deduplicated. Check a recent execution record: if it reports skipped, the data hasn't changed.
  • "Two snapshots in a row even though nothing changed." store_duplicates is on in your settings. Turn it off for the default behavior.
  • "APIC error response" rejections in logs. The underlying query is failing against APIC. Run it manually; you'll see the same error. Fix auth or query syntax and the captures start working.
  • "The snapshot size limit is too tight for a legitimate use." Raise the per-user limit. Keep the global low; raise individual power users.
  • "I deleted the saved query and my snapshots disappeared." Cascade delete — they're gone. Restore from backups or re-create.

Captures are cheap once dedup is working. The next page — Comparing and drift — is where the value shows up: turning a pile of snapshots into an answer to "what changed."