Skip to content

Operator Recovery

Use this pattern when jobs can fail in recoverable ways and operators need to inspect, dead-letter, and replay them from the terminal.

This is where localqueue feels more useful than a pile of shell scripts: failures stay visible, recovery is explicit, and replay does not require a custom one-off script.

Best fit

  • webhook delivery with occasional upstream outages
  • notification pipelines that need human inspection on failure
  • CLI-driven maintenance jobs where operators need a dead-letter queue

Why localqueue fits

  • failed jobs can move to dead-letter after a small retry budget
  • operators can inspect summary and full records locally
  • replay is one command instead of ad hoc recovery code

Boundaries

  • recovery is local to one machine and one shared store
  • this does not replace broker-level operations across multiple services

Setup

Create a scratch directory and a small worker script once:

from __future__ import annotations

import json
import sys


payload = json.load(sys.stdin)
address = payload["to"]
if payload.get("fail"):
    raise ConnectionError(f"could not deliver email to {address}")
print(f"sent email to {address}")

Minimal flow

Create a failing job:

echo '{"to":"broken@example.com","fail":true}' | localqueue queue add webhooks

Process it with a small retry budget so it reaches dead-letter quickly:

localqueue queue exec webhooks \
  --max-tries 2 \
  -- python /tmp/localqueue-use-cases/email_worker.py

Inspect the dead-letter summary and full record:

localqueue queue dead webhooks --summary

localqueue queue dead webhooks

Replay everything in the dead-letter queue:

localqueue queue requeue-dead webhooks --all