Workqueue saturation see history edit this page

Talks about: , , , and

Symptom

workqueue_depth{controller="stageset"} stays high; StageSets reconcile slowly or lag behind their spec.interval. The StageSetControllerWorkqueueDepthHigh alert fires (see operations for the alert set and its thresholds).

Cause

The controller is enqueuing reconcile requests faster than it completes them. Common causes:

Diagnosis

# which StageSets are churning?
kubectl get stagesets -A --sort-by=.status.observedGeneration
# controller logs for slow operations / retries
kubectl -n stageset-system logs deploy/stageset-controller --tail=200

Correlate with controller_runtime_reconcile_time_seconds (see reconcile latency) and apiserver latency.

Remediation