Engine Lifecycle

The Alchemist engine is a background loop that claims queued jobs, processes them, and manages concurrent execution. This page documents all states, what triggers each transition, and the exact behavior during cancel, pause, drain, and restart.

Engine states

State	Jobs start?	Active jobs affected?	How to enter
Running	Yes	Not affected	Resume, restart
Paused (manual)	No	Active jobs continue	`POST /api/engine/pause`, Runtime pause control
Paused (scheduler)	No	Not cancelled	Schedule window activates
Draining	No	Run to completion	Header → Stop (while running), `POST /api/engine/drain`
Restarting	No (briefly)	Cancelled	`POST /api/engine/restart`
Shutdown	No	Force-cancelled	Process exit / SIGTERM

Paused-manual and paused-scheduler are independent. Both must be cleared for jobs to start again.

State transitions

             Resume
  ┌──────────────────────────────┐
  │                              ▼
Paused ◄─── Pause ─────── Running ──── Drain ───► Draining
  │                         ▲  │                     │
  │         Restart          │  └─── Shutdown ──►  Shutdown
  │      ┌──────────┐        │
  └─────►│ Restart  │────────┘
         └──────────┘
         (brief pause,
         cancel in-flight,
         then resume)

Pause

Sets manual_paused = true.
The claim loop polls every 2 seconds and blocks while paused.
Active jobs continue until they finish naturally.
Does not affect draining state.

Resume

Clears manual_paused.
Does not clear scheduler_paused (scheduler manages its own flag).
The claim loop immediately resumes on the next iteration.
Does not cancel the drain if draining.

Drain

Sets draining = true without setting paused.
No new jobs are claimed.
Active jobs run to completion.
When in_flight_jobs reaches zero: drain completes, draining is cleared, engine transitions to Paused (manual).

Restart

Pause (set manual_paused = true).
Cancel all in-flight jobs (Encoding, Remuxing, Analyzing, Resuming) via FFmpeg kill signal.
Clear draining flag.
Clear idle_notified flag.
Resume (clear manual_paused).

Cancelled in-flight jobs are marked failed with failure_summary = "cancelled". They are eligible for automatic retry per the retry backoff schedule.

Shutdown

Called when the process exits (SIGTERM / graceful shutdown):

Cancel all active jobs via FFmpeg kill.
Wait up to a short timeout for kills to complete.
No retry is scheduled — the jobs return to queued on next startup.

Job states

Job state	Meaning	Terminal?
`queued`	Waiting to be claimed	No
`analyzing`	FFprobe running on the file	No
`encoding`	FFmpeg encoding in progress	No
`remuxing`	FFmpeg stream-copy in progress	No
`resuming`	Job being re-queued after retry	No
`completed`	Encode finished successfully	Yes
`skipped`	Planner decided not to transcode	Yes
`failed`	Encode or analysis failed	Yes (with retry)
`cancelled`	Cancelled by operator	Yes (with retry)

Retry backoff

Failed and cancelled jobs are automatically retried. The engine checks elapsed time before claiming.

Attempt #	Backoff before retry
1	5 minutes
2	15 minutes
3	60 minutes
4+	6 hours

After 3 consecutive failures with no success, the job still retries on the 6-hour schedule. There is no permanent failure state from retries alone — operator must manually delete or cancel the job to stop retries.

Cancel semantics

Cancel mid-analysis

FFprobe process is not currently cancellable via signal. The cancel flag is checked before FFprobe starts. If analysis is in progress when cancel arrives, the job will be cancelled after analysis completes (before encoding starts).

Cancel mid-encode

The FFmpeg process receives a kill signal immediately. The partial output file is cleaned up. The job is marked failed with failure_summary = "cancelled".

Cancel while queued

The job status is set to cancelled directly without any process kill.

Pause vs. drain vs. restart

Operation	In-flight jobs	Partial output	New jobs
Pause	Finish normally	Not affected	Blocked
Drain	Finish normally	Not affected	Blocked until drain completes
Restart	Killed	Cleaned up	Blocked briefly, then resume
Shutdown	Killed	Cleaned up	N/A

Use Pause when you need to inspect the queue or change settings without losing progress.

Use Drain when you want to stop gracefully after the current batch finishes (e.g. before maintenance).

Use Restart to force a clean slate — e.g. after changing hardware settings that affect in-flight jobs.

Boot sequence

Migrations run.
A valid hardware detection cache is loaded immediately if the runtime fingerprint still matches.
The web server and UI become available.
Hardware detection runs or refreshes in the background when the cache is missing or stale, then updates runtime state and the probe log.
Any jobs left in encoding, remuxing, analyzing, or resuming are reset to queued (crash recovery).
Boot analysis runs for queued jobs missing metadata. This uses a single-slot semaphore and blocks the claim loop.
Engine claim loop starts — jobs are claimed and processed up to the concurrent limit.

Engine states​

State transitions​

Pause​

Resume​

Drain​

Restart​

Shutdown​

Job states​

Retry backoff​

Cancel semantics​

Cancel mid-analysis​

Cancel mid-encode​

Cancel while queued​

Pause vs. drain vs. restart​

Boot sequence​