Docs
Troubleshooting (self-hosted)
Troubleshooting (self-hosted)
Operational playbooks—failed tasks, stalled queues, Redis pressure, webhook retries, DLQ handling.
Assume Linux containers or equivalent process supervision. Tail logs with workflow run IDs correlated to task attempts.
Symptoms → checks
Failed jobs / red tasks
- Pull the workflow run timeline plus task stderr payloads from storage/logs.
- Validate timeouts versus external latency—upstream systems often exceed defaults (retries & timeouts).
- Compare enqueue-time proofs vs live definitions for drift after hotfixes landed mid-run (
GET /api/proofs/verify-definition/{runId}) (drift detection).
Stuck queues / no progress
Concrete checks:
- Confirm workers subscribed to Redis streams/queues backing your deployment (queue workers).
- Inspect heartbeat / stale consumer metrics—Hung tasks sometimes indicate DB connection saturation.
- Evaluate exclusive locks preventing parallel incompatible tasks.
Redis issues
Indicators: elevated latency spikes, eviction warnings, persistence failures.
Mitigations:
- Ensure bounded memory, correct maxmemory-policy, and backups per platform guidance.
- Never share one Redis ACL between unrelated fleets—risk noisy neighbour starvation.
- If using Redis clustering, verify cascaded clients respect routing + timeout knobs in environment bundles relevant to Cascades deployments.
Replaying failures
- Freeze context: screenshot enqueue snapshot hash, task IDs, and integration versions.
- Determine whether rerun should be idempotent replay versus repair-run (failure recovery).
- For vendor outages, enqueue follow-up remediation once SLAs stabilize—preserve original execution proofs untouched for audit timelines.
Dead-letter queues (DLQ)
When tasks exhaust retries:
- Items land DLQ-compatible stores for analysts—do not silently delete.
- Replay only after RCA; attach ticketing references to follow-up automation changes.
Webhooks from GitHub/GitLab never arrive
Walk the integration-specific debugging in integrations and webhooks guides—signature mismatch, rotated secrets, VPC egress.