Executions

Live AWX job monitoring — streaming playbook output, per-host outcomes, relaunch and cancel, and how the backend keeps every event in sync with the browser.

An AutomationExecution is the AWX job launched for a request. One request, one execution — bulk mode is the only mode, so even a 200-row submission produces exactly one AWX job and one execution record.

This page is about what happens from the moment AWX accepts the launch to the moment the job finishes: live output streaming, per-host results, relaunch, cancel, and the plumbing that keeps the UI honest.

The execution record

Every execution carries:

Field	What it holds
awx_job_id	AWX's numeric job ID. `null` briefly between Fabrik creating the execution and AWX accepting the launch.
awx_job_url	Direct link to the job in AWX's own UI.
status	`pending`, `waiting`, `running`, `successful`, `failed`, `error`, `canceled`.
progress_percentage	Tasks completed, parsed from AWX playbook counts.
current_task	The most recent task name AWX reported.
elapsed_seconds	Populated on completion.
awx_job_data	The full AWX job JSON — stored verbatim for later analysis.
artifacts	Whatever the playbook set as artifacts.
row_results	Per-host outcomes parsed from artifacts after completion.
relaunch_of / relaunch_count	Link back to the original execution if this is a relaunch.

The record is the canonical source of truth on the Fabrik side. Output lives separately in JobOutputChunk.

Live monitoring

AWX → Executions is the real-time monitor. It lists running and recently-completed executions and, when you click a row, opens the Job Output Viewer in a dialog.

The output viewer has three zones:

Header — template name, AWX job ID, current status, progress bar.
Terminal — the streamed stdout, rendered in a terminal-style font with color-coding for task results (ok, changed, failed, skipped).
Footer — elapsed time, controls (Cancel, Relaunch, Open in AWX).

Streaming: how the pipeline works

Fabrik doesn't wait for AWX to finish before showing output. A dedicated Celery task — stream_job_output — spins up per execution and walks AWX's /job_events/ endpoint:

Poll AWX every ~0.5 seconds for new events.
Persist each event as a JobOutputChunk row, keyed by (execution, counter) so retries don't dupe.
Publish the event to a Redis channel.
WebSocket consumer reads the channel and pushes to the browser.
Frontend appends to the terminal viewer.

The full output stays in JobOutputChunk — you can close the viewer mid-run and reopen it later without losing anything. Historical executions replay from the stored chunks.

Why chunks instead of a single stdout field

AWX playbooks regularly produce megabytes of output. Storing it all in a single PostgreSQL TEXT field works but doesn't scale — and truncation is the worst failure mode for a compliance audit. Splitting into chunks (one row per AWX event) keeps the stream bounded, lets us stream incrementally, and survives pagination if AWX returns >100 events between polls.

Reliability: cursors, keepalives, watchdog

The poller is designed to survive Celery worker restarts without re-sending every event to the frontend. Two Redis keys are involved:

Cursor (awx:stream:cursor:{execution_id}) — the last counter seen. If the task dies and restarts, it resumes from here.
Keepalive (awx:stream:alive:{execution_id}) — updated every poll cycle, 30-second TTL. A watchdog checks this periodically; if it's expired and the execution is still running, the watchdog restarts the poller.

The design assumption: AWX is the source of truth for what happened, and the poller is a faithful-but-restartable copy mechanism. Temporary failures don't lose data; they just delay streaming.

Heartbeats

Even during quiet moments — a playbook task that takes 30 seconds without producing output — the poller emits a WebSocket heartbeat every 5 seconds. The frontend uses this to detect a stuck stream without waiting for new events. If no heartbeat arrives for ~15 seconds, the viewer shows a "reconnecting" indicator.

Per-host outcomes (row_results)

After an AWX job finishes, the job monitor parses its artifacts and extracts per-host results — which hosts reported ok, changed, failed, unreachable. This lands on the execution's row_results field.

The execution detail view renders these as a table:

Host	Status	Tasks ok	Changed	Failed	Unreachable
leaf-101.lab	ok	12	3	0	0
leaf-102.lab	failed	8	1	2	0

This is the key thing bulk mode gives up: in per-row mode you'd have a distinct AWX job per row, making row-level status trivial. In bulk mode, every row goes to one job and you rely on the playbook reporting per-host stats in its artifacts for this view to populate. Playbooks that don't emit structured artifacts will show an empty row_results — the execution still ran correctly, it just can't be diced up by row after the fact.

Status lifecycle

pending → waiting → running → successful
                           ↘  failed
                           ↘  error
                           ↘  canceled

pending — Fabrik has created the execution but hasn't yet launched the AWX job. Usually seconds.
waiting — AWX has accepted the launch but is queued behind other jobs.
running — AWX is actually executing the playbook.
successful — AWX reports success. Note: this means AWX's own criteria are met; individual hosts may still have failed. Check row_results for the finer picture.
failed — The playbook failed in the Ansible sense (task failures exceeded the tolerance AWX was configured with).
error — AWX itself failed to run the job (credential issue, missing inventory, etc.).
canceled — Someone clicked Cancel or the AWX job was canceled from AWX's own UI.

Terminal statuses (successful, failed, error, canceled) stop the poller and freeze the execution record.

Cancel

The Cancel button sends a POST to AWX's /jobs/{id}/cancel/ endpoint. AWX stops the job at the next task boundary — not mid-task. What's already been applied stays applied; check_mode on the original request is irrelevant here (cancel isn't a rollback).

Status transitions to canceled; the poller drains any remaining events and shuts down.

Cancel is always available while the job is pending, waiting, or running. Once terminal, the button is hidden.

Relaunch

The Relaunch button creates a new execution against the same request:

Same template snapshot.
Same input data.
Same APIC and AWX credential.
relaunch_of points back to the original execution.
relaunch_count increments each link in the chain (0 = original, 1 = first relaunch).
Chain depth is capped at 3 — beyond that, fix the underlying issue rather than relaunching.

Relaunches are visible in the execution list with a distinctive indicator. The chain is preserved so you can walk back through "this was the 3rd attempt, here's what failed on attempt 2."

Relaunch uses the frozen template snapshot — even if the template has been edited since, the relaunch runs exactly what the original request would have.

Under the hood, relaunch routes through the same execution path as the original launch — not AWX's /relaunch/ endpoint. For workflow templates this is what makes relaunch reliable: each relaunch creates a fresh ephemeral workflow clone with the credential bound on its nodes (see Templates → Workflow clones). AWX's own /relaunch/ would re-snapshot from the post-cleanup template config and lose the node-level credentials, surfacing as apic_host is undefined inside the playbook.

Opening the AWX job directly

Every execution carries awx_job_url, a direct link to the job in AWX's own UI. For deeper diagnostics — environment dumps, credential visibility, raw inventory — AWX's UI is usually the better tool. Fabrik gets out of the way with the Open in AWX button.

Retention

Executions don't auto-expire. Old executions stay in the database along with their JobOutputChunk rows.

For deployments with lots of high-volume automations, periodic cleanup is on the roadmap but currently manual — a Django admin or manage.py shell deletion query is the standard approach.

Output size and storage

A single playbook can produce tens of thousands of events. The chunk-per-event model scales fine — PostgreSQL handles the row count comfortably, and retrieval is by execution_id + counter index. The per-execution output view pages through chunks by counter range.

For very large outputs (hundred-thousand-event playbooks), the terminal viewer uses virtualization so the browser doesn't choke on DOM size.

Troubleshooting

Execution issues that come up often:

"Execution stuck in pending." Celery workers may be down, or the AWX launch call failed silently. Check docker compose logs celery-worker for the launch attempt.
"Output stopped streaming mid-run." Watchdog should restart the poller within a minute. If it doesn't, inspect Redis for the keepalive key and the Celery worker for the stream_job_output task.
"row_results is empty but the job was successful." The playbook doesn't emit per-host artifacts the monitor can parse. Playbook authors can add a final task that writes a structured artifact AWX exposes as artifacts.
"Cancel didn't stop the job." AWX cancellation stops at the next task boundary. If the current task is long-running (a module that doesn't check cancellation), it may take minutes to respond.
"Relaunch failed with a validation error." The template's validation query may now return different values than when the original was submitted — e.g., the tenant the original targeted has since been deleted. Relaunches re-run validation against current state.
"Workflow run fails with apic_host is undefined." This was a known issue with workflow relaunches in older versions where AWX's /relaunch/ endpoint re-snapshotted from a template whose node-level credentials had already been cleaned up. The clone-based launch path (see Templates → Workflow clones) eliminates it; if you still see it, check that the AWX token has Workflow Admin on the org so the clone can be created in the first place.
"Output viewer shows old events when reopened." Expected — the viewer loads history from JobOutputChunk and then resumes live streaming if the execution is still running.
"Status says successful but hosts failed." AWX marks a job successful based on its own tolerance config. The per-host row_results is the finer signal; treat successful + non-zero host failures as a partial success.

That's AWX automation end-to-end — connections, templates, validations, requests, and executions. The next section — Time Machine — is about snapshotting APIC state over time, diffing, and drift detection.

On this page