Rollouts

Publish at:

Progressive delivery flow

Progressive delivery checkpoint #

By now we have GitOps delivery, global traffic, secrets, policies, metrics, logs, and traces. That gives enough visibility to decide whether a change is healthy. The missing piece is a controlled way to pause, inspect, promote, or abort a workload change.

Before this checkpoint, Mandelbrot was a normal Kubernetes Deployment. A bad change could still be fixed through Git: revert the commit and let Argo CD reconcile. That is a valid basic rollback path, but it does not show progressive delivery.

This checkpoint adds Argo Rollouts and changes Mandelbrot from a Deployment to a Rollout. The goal is deliberately small:

  • install the Argo Rollouts controller in every cluster
  • run Mandelbrot with two replicas
  • pause a canary at 50 percent
  • inspect the live system with the signals already built
  • promote or abort from a controlled operator workflow

No service mesh is involved. This is still Kubernetes service-level traffic balancing, not provider-specific traffic splitting. That is enough to prove the operating motion.

GitOps ownership #

Argo CD installs Argo Rollouts through one application per cluster:

trinity-rollouts-aws
trinity-rollouts-gcp
trinity-rollouts-azure

The AWS application is representative:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: trinity-rollouts-aws
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "-1"
spec:
  project: trinity
  source:
    repoURL: https://argoproj.github.io/argo-helm
    chart: argo-rollouts
    targetRevision: 2.40.9
    helm:
      releaseName: argo-rollouts
      values: |
        controller:
          replicas: 1
        dashboard:
          enabled: false
  destination:
    server: https://kubernetes.default.svc
    namespace: argo-rollouts

The real manifest also pins controller resources. The dashboard is disabled because this checkpoint uses the CLI workflow rather than exposing another UI. The Argo CD project had to allow the Argo Helm repository and the Rollouts analysis kinds:

sourceRepos:
  - https://github.com/maxgherman/trinity.git
  - https://argoproj.github.io/argo-helm
clusterResourceWhitelist:
  - group: argoproj.io
    kind: AnalysisTemplate
  - group: argoproj.io
    kind: ClusterAnalysisTemplate

The Mandelbrot application also gets one important sync option:

syncOptions:
  - CreateNamespace=true
  - ApplyOutOfSyncOnly=true
  - SkipDryRunOnMissingResource=true

SkipDryRunOnMissingResource=true avoids a dry-run failure when Argo CD sees the Rollout manifest before the Rollouts CRD is established. The workflow still syncs the root and waits for the CRD before it asks Mandelbrot to sync.

Mandelbrot Rollout #

The workload changes from Deployment to Rollout:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: mandelbrot
  namespace: mandelbrot
spec:
  replicas: 2
  strategy:
    canary:
      steps:
        - setWeight: 50
        - pause: {}
        - setWeight: 100

The canary is intentionally simple. With two replicas, the rollout can hold one stable pod and one canary pod while an operator checks the system. The checks are the tools the previous chapters already built:

  • UI behavior through Front Door
  • /api/meta
  • Prometheus metrics
  • Loki logs
  • Jaeger or Grafana Cloud traces
  • Argo CD and Rollout status

The pod template also gets a release marker:

env:
  - name: MANDELBROT_RELEASE
    value: continuous

The app exposes that value through /api/meta:

{
  "cloud": "aws",
  "region": "us-east-1",
  "release": "continuous",
  "pod": "mandelbrot-69d45cf95f-khkkh",
  "route": ["aws", "gcp", "azure"]
}

That detail matters because the app source is mounted from a ConfigMap. A ConfigMap-only change does not automatically create a new ReplicaSet. Changing MANDELBROT_RELEASE, or another pod-template field, gives Kubernetes a real rollout signal.

Operator workflow #

The operator path moves into GitHub Actions. The Mandelbrot Rollout workflow authenticates to AWS, GCP, and Azure with the same GitHub OIDC identities as the deployment workflow. It installs the kubectl argo rollouts plugin and supports these operations:

status
sync
promote
promote-full
abort
undo
restart

The workflow is manually triggered and gated by the infra-deploy-approval environment. That keeps rollout promotion in an explicit operator path instead of hiding it in an automatic post-merge job. The workflow configures the selected cluster context, syncs the root application when needed, and waits for the Rollouts CRD:

kubectl --context "${context}" -n argocd patch application "trinity-${TRINITY_ENVIRONMENT}-${cloud}-root" \
  --type merge \
  -p '{"operation":{"sync":{"syncStrategy":{"hook":{}}}}}'

kubectl --context "${context}" wait crd/rollouts.argoproj.io \
  --for=condition=Established \
  --timeout=30s

Then sync patches the Mandelbrot Argo CD application and waits until the Rollout is either paused or healthy:

kubectl --context "${context}" -n argocd patch application "trinity-mandelbrot-${cloud}" \
  --type merge \
  -p '{"operation":{"sync":{"syncStrategy":{"hook":{}}}}}'

Promotion is explicit:

kubectl argo rollouts --context "${context}" promote mandelbrot \
  --namespace mandelbrot

There was one real workflow bug during the first pass. The kubectl plugin does not accept the context flag before the plugin name:

kubectl --context aws argo rollouts get rollout mandelbrot

That fails with:

flags cannot be placed before plugin name: --context

The fixed form passes --context after the plugin command:

kubectl argo rollouts --context aws get rollout mandelbrot \
  --namespace mandelbrot

Release Drill #

A normal release is still Git-first:

  1. Merge the application change.
  2. Run Mandelbrot Rollout with operation: sync and cloud: all.
  3. Inspect the paused 50 percent canary.
  4. Run operation: promote when the canary is good.

The first adoption of the Rollout may go straight to healthy. There is no previous stable ReplicaSet to canary against. The pause becomes visible on the next pod-template change.

That mattered in the first real test. A mounted ConfigMap change alone would not prove the canary machinery, so the branch changed both user-visible behavior and the pod template. The UI now renders continuously instead of drawing one sample, and MANDELBROT_RELEASE changed from stable to continuous.

The validated drill was:

  1. Merge the continuous-render change to main.
  2. Run operation: sync, cloud: all.
  3. Confirm each rollout pauses with two desired replicas, one stable pod and one canary pod.
  4. Open the app through Front Door and verify continuous rendering while the rollout is paused.
  5. Run operation: promote, cloud: all.
  6. Confirm the rollouts and Argo CD applications return to Healthy.

One later follow-up reduced the idle delay between continuous render cycles. That changed MANDELBROT_RELEASE to faster-cycles, and /api/meta through Front Door reported the new release from a live pod:

{"cloud":"aws","region":"us-east-1","release":"faster-cycles","pod":"mandelbrot-69d45cf95f-khkkh","route":["aws","gcp","azure"]}

After promotion, all three clusters reached a healthy rollout state:

cloud  status   step  weight  desired  ready  stable replica set
aws    Healthy  3/3   100     2        2      mandelbrot-69d45cf95f
gcp    Healthy  3/3   100     2        2      mandelbrot-bbb879c59
azure  Healthy  3/3   100     2        2      mandelbrot-7c45d9dfdb

Rollback #

If the canary is bad, use the operator workflow with:

operation: abort
cloud: all

That stops the in-progress Rollout and leaves the stable ReplicaSet serving traffic.

For a durable GitOps rollback, revert the bad Git commit and run:

operation: sync
cloud: all

The workflow also exposes undo as an emergency escape hatch:

operation: undo
cloud: all
to_revision: <optional-rollout-revision>

That can move the live Rollout back to a previous revision, but it does not replace the need to fix Git. Argo CD self-heal will keep reconciling the declared revision from the repository. For the platform's normal operating model, Git remains the durable source of truth.

CI check #

CI had to understand the new manifest kinds. The manifest checker now allows the Argo Rollouts kinds:

AnalysisTemplate
ClusterAnalysisTemplate
Rollout

The Mandelbrot overlays still render through Kustomize:

kubectl kustomize apps/mandelbrot/overlays/aws
kubectl kustomize apps/mandelbrot/overlays/gcp
kubectl kustomize apps/mandelbrot/overlays/azure

That catches the basic shape before Argo CD or the rollout controller sees the change.

Exit #

This closes the original delivery loop. The platform can now deploy application changes through Git, route traffic globally, observe behavior across clusters, enforce a small policy baseline, and promote or abort a canary.

The rollout setup is intentionally modest. It does not use mesh traffic splitting or automated metric analysis yet. Those would be natural next steps, but the platform now has the essential operator motion: change, pause, inspect, promote, abort, and recover through Git.

Source code #

Reference implementation (opens in a new tab)