StageSet see history edit this page

Talks about: , , , and
apiVersion: stages.metio.wtf/v1
kind: StageSet

A StageSet is a namespaced Kubernetes resource describing an ordered set of stages. Only spec.stages is required; everything else refines scheduling, security, gating, versioning, and rollback. Every field below is shown in YAML at least once.

The smallest valid StageSet:

apiVersion: stages.metio.wtf/v1
kind: StageSet
metadata:
  name: my-app
  namespace: default
spec:
  stages:
    - name: app
      sourceRef:
        name: my-app

Scheduling

spec:
  interval: 5m                  # optional: reconcile cadence (default: --default-interval)
  retryInterval: 1m             # cadence after a failed run (default: interval)
  driftDetectionInterval: 2m    # faster drift correction than interval (optional)
  timeout: 5m                   # default per-stage timeout (optional)
  suspend: false                # pause reconciliation without deleting (default false)

Ordering between StageSets

dependsOn gates this StageSet on others being Ready at their observed generation — cross-release ordering. (Ordering within a StageSet is the order of stages.)

spec:
  dependsOn:
    - name: platform
      namespace: platform-system

Security and targeting

spec:
  serviceAccountName: payments-deployer   # impersonated for every cluster operation
  kubeConfig:
    secretRef:
      name: prod-eu-kubeconfig            # apply to a remote cluster
  decryption:
    provider: sops                        # decrypt SOPS files in stage sources
    secretRef:
      name: sops-age                      # holds an age key under *.agekey

Versioning and migrations

Versioning is off unless spec.version is set. Set exactly one of value / fromObject / fromArtifact:

spec:
  version:
    # fromObject reads the version from a rendered object — by default the
    # app.kubernetes.io/version label, so it travels in the manifests (works for
    # every source kind, including JaaS). The recommended default.
    fromObject:
      stage: app
      kind: Deployment
      name: web
      # apiVersion: apps/v1            # optional; narrows an ambiguous Kind+Name
      # fieldPath: "{.data.version}"   # optional JSONPath; defaults to the version label
    # value: "2.1.0"                   # …or pin it inline
    # fromArtifact: { stage: app, path: VERSION }   # …or read a VERSION file (Git/OCI/Bucket)

  migrations:
    - name: backfill-ledger-2-0 # idempotency-ledger / Events name
      from: "1.*"               # optional: constrain the version it applies from
      to:   "2.0.0"             # required: the boundary this migration crosses
      stage: app                # runs before this stage's pre-actions
      actions:                  # the same Action shape used by stages (see below)
        - name: backfill
          job:
            sourceRef:
              name: ledger-backfill-job

See versioned migrations.

Rollback

spec:
  rollbackOnFailure: true       # restore last-good revisions on a failed run

Needs a rollback store configured; see rollback.

Update windows

Gate when new revisions roll out. Each window is Allow or Deny, recurring (cron) or absolute (from/to). windowScope controls how strict a closed window is.

spec:
  windowScope: Updates          # Updates (default): hold rollouts, keep correcting
                                # drift. All: a hard freeze — no applies at all.
  updateWindows:
    - type: Deny                # Deny always wins over Allow
      schedule: "0 9 * * MON-FRI"   # 5-field cron: window start
      duration: 8h
      timeZone: Europe/Berlin   # IANA tz (default UTC)
    - type: Deny                # an absolute one-off freeze
      from: 2026-12-24T00:00:00Z
      to:   2026-12-27T00:00:00Z

A recurring window uses schedule + duration; an absolute window uses from + to. See update windows.


Stages

stages (required, min 1) is the ordered list. A stage with every field set:

spec:
  stages:
    - name: app                 # required; DNS-label, unique in the StageSet
      sourceRef:
        name: my-app            # required
        kind: ExternalArtifact  # default; also GitRepository/OCIRepository/Bucket
                                # directly, or a producer (e.g. JsonnetSnippet)
        apiVersion: source.toolkit.fluxcd.io/v1   # required for a producer kind
        namespace: other-ns     # default: the StageSet's namespace
      path: ./overlays/prod     # path inside the artifact (default ./)
      prune: true               # GC objects that leave the stage (default true)
      timeout: 3m               # per-stage timeout (default: spec.timeout)
      force: false              # sugar for conflictPolicy.default: Recreate
      applyHelmHookResources: true  # apply helm.sh/hook objects as ordinary ones
      patches: []               # Kustomize patches applied after build
      conflictPolicy: {}        # see below
      postBuild: {}             # see below
      actions: {}               # see below
      readyChecks: {}           # see below

sourceRef.kind defaults to ExternalArtifact, so the common case is just sourceRef: { name: … }. A sourceRef resolves to a Flux artifact in one of three ways: an ExternalArtifact (RFC-0012, the default), a classic Flux source — GitRepository, OCIRepository, or Bucket — consumed directly, or any other kind treated as a producer and resolved to its ExternalArtifact via the back-pointer index. See stages and sources and producer-aware sources.

patches

kustomize strategic-merge or JSON6902 patches, applied after the build:

      patches:
        - patch: |
            - op: replace
              path: /spec/replicas
              value: 6
          target:
            kind: Deployment
            name: web

postBuild

Variable substitution after build and patching:

      postBuild:
        substitute:
          cluster_name: prod-eu        # inline key/value
        substituteFrom:
          - kind: ConfigMap            # required: ConfigMap or Secret
            name: cluster-vars
          - kind: Secret
            name: cluster-secrets
            optional: true             # tolerate a missing source

conflictPolicy

Per-resource answers to apply conflicts (immutable fields, ownership):

      conflictPolicy:
        default: Fail                  # Fail (default) | Recreate | KeepExisting
        rules:
          - target:                    # partial selector; unset fields match all
              apiVersion: batch/v1
              kind: Job
            action: Recreate
          - target:
              kind: PersistentVolumeClaim
              name: scratch
            action: Recreate
            allowDataLoss: true        # required to Recreate a PVC/PV

See conflict policies.

readyChecks

Gate when the stage counts as complete:

      readyChecks:
        timeout: 5m
        disableWait: false             # true = apply without waiting for readiness
        checks:                        # explicit objects, evaluated with kstatus
          - apiVersion: apiextensions.k8s.io/v1
            kind: CustomResourceDefinition
            name: ledgers.example
        exprs:                         # custom health via CEL expressions (healthCheckExprs shape)
          - apiVersion: db.example/v1
            kind: Database
            current: "status.phase == 'Running'"
            inProgress: "status.phase in ['Pending','Provisioning']"
            failed: "status.phase == 'Failed'"

Health expressions use CEL. See ready checks.


Actions

stages[].actions (and migrations[].actions) carry typed steps. Each Action has a name, optional timeout/retries, and exactly one operation block.

      actions:
        pre:        # before apply; failure aborts the stage with nothing applied
          - name: db-migrate
            timeout: 10m
            retries: 2
            job:
              sourceRef: { name: my-app-migrations }
              path: ./jobs
        post:       # after verify; the stage is Ready only if these pass
          - name: smoke-test
            http:
              url: https://my-app.internal/healthz
              method: GET                    # default POST
              expectedStatus: [200]          # default: any 2xx
              headersFrom:
                - name: gate-token
                  key: token
        onFailure:  # best-effort on any failure from apply onward
          - name: page-oncall
            http:
              url: https://alerts.internal/stageset-failed

The six operation types — one per Action:

# patch — patch an existing object
- name: enable-traffic
  patch:
    target: { apiVersion: v1, kind: Service, name: web }
    type: merge                # merge (default) | json6902
    patch: '{ "spec": { "selector": { "release": "green" } } }'

# http — call an endpoint (hosts gated by --allowed-action-hosts)
- name: approve
  http:
    url: https://gate.internal/approve
    bodyFrom: { name: approve-secret, key: body }

# wait — block for a duration or until a CEL expr holds
- name: settle
  wait:
    duration: 30s
- name: until-available
  wait:
    target: { apiVersion: apps/v1, kind: Deployment, name: web }
    expr: "status.availableReplicas >= 3"
    timeout: 5m

# job — render and await Jobs from an artifact
- name: migrate
  job:
    sourceRef: { name: my-app-migrations }
    path: ./jobs

# delete — remove an existing object (missing = success)
- name: drop-legacy
  delete:
    target: { apiVersion: batch/v1, kind: Job, name: legacy-migration }

# apply — transient, rollout-scoped manifests (NOT inventory-tracked, never pruned)
- name: canary
  apply:
    sourceRef: { name: my-app-canary }
    path: ./
    wait: true                 # block until applied objects report Ready

See actions.


status

status is controller-owned and read-only. A representative snapshot:

status:
  observedGeneration: 7
  conditions:
    - type: Ready
      status: "True"
      reason: Succeeded
      message: All 2 stages applied
  lastHandledReconcileAt: "2026-06-15T09:21:04Z"
  lastAttemptedRevisions: { payments/payments-app: sha256:1a2b }
  lastAppliedRevisions:   { payments/payments-app: sha256:1a2b }
  version: "2.1.0"
  pendingMigrations: []
  executedMigrations: []
  inventoryMode: hybrid
  stages:
    - name: infrastructure
      phase: Ready             # Pending|Applying|Pruning|Verifying|Ready|Failed
      appliedRevision: sha256:9f3c
      entriesCount: 12
      shards: 1
      message: ""
      executedActions: []
      ledgerRevision: sha256:9f3c
  lastAppliedSnapshot:
    - stage: infrastructure
      url: http://source-controller.../infra.tar.gz
      digest: sha256:9f3c
  pendingUpdate:               # set only when a window holds a rollout
    revisions: { payments/payments-app: sha256:cafe }
    nextWindowOpens: "2026-06-16T08:00:00Z"
  lastHandledUpdateOverride: "2026-06-15T09:30:00Z"

The Ready condition’s reason is one of the wire-stable values documented in the runbooks.