Runbooks
- ArtifactNotFound
The referenced ExternalArtifact could not be found (transient; the controller requeues).
- Controller pod down
A stageset-controller pod has been NotReady for the alert window.
- DependencyNotReady
A StageSet named in spec.dependsOn is not yet Ready.
- DowngradeRequiresMigration
The desired version is below the deployed version and a migration boundary blocks the downgrade.
- InvalidSpec
The StageSet spec is invalid; the Message names the offending field or action.
- InvalidVersion
A version source or value could not be parsed as semver.
- Operations
Metrics, alerts, events, and runbooks for running the controller day to day.
- PreviousRevisionUnavailable
rollbackOnFailure is set but the last-good revisions could not be restored.
- Reconcile latency high
Reconcile p99 latency for the StageSet controller is above threshold.
- ResolveFailed
A source reference could not be resolved to a ready ExternalArtifact.
- SourceNotReady
The source exists but has not published a ready artifact yet (transient).
- StageFailed
A stage failed to fetch, build, apply, verify, or run an action; the run halts there.
- Stalled
The run cannot make progress and will not retry until the spec changes.
- Succeeded
All stages applied and verified the healthy steady state.
- Suspended
Reconciliation is paused via spec.suspend.
- UpdateDeferred
A new revision is held by a closed update window.
- Webhook cert renewal failing
The self-signed admission webhook certificate is not being rotated.
- Workqueue saturation
The controller cannot drain its reconcile queue fast enough.