Controller pod down see history edit this page

Talks about: , , , and

Symptom

A stageset-controller pod is NotReady; the StageSetControllerPodDown alert fires. While no replica is Ready, StageSets are not reconciled and the Kubernetes admission webhook may reject StageSet writes (failurePolicy: Fail).

Cause

Diagnosis

kubectl -n stageset-system get pods -l app.kubernetes.io/name=stageset-controller
kubectl -n stageset-system describe pod <pod>
kubectl -n stageset-system logs <pod> --previous --tail=200

Look for flag-parse errors at startup, RBAC Forbidden on the controller’s own ServiceAccount, or OOMKills.

Remediation

See operations for the full alert set and its thresholds.