The Trinity

Published: 24 Apr 2026 4 min read

The trinity of cloud providers

It's been a while since I stretched my infra muscles. I was looking for some exercise and that got me thinking.

Application code is easy to revisit in small slices. Infrastructure is not. Leave Kubernetes, cloud networking, GitOps, observability, and delivery machinery alone for long enough, and the surface area moves under your feet. Managed services change, defaults change, and patterns that once felt sensible begin to look suspiciously dated.

I wanted an exercise with enough weight to be interesting and grounded in realistic platform work. One that reflects the kind of problem an experienced infrastructure engineer should be able to reason about: reproducibility, traffic management, policy, secrets, observability, and failure.

And so, I created such an exercise. The one that reflects all the points, yet gives enough flexibility for implementation.

Multi-Cloud Platform Engineering Exercise Spec #

1. Purpose #

This exercise is designed to test modern infrastructure and platform engineering skills at a level comparable to “Kubernetes across three cloud providers,” but in a way that is more realistic, more useful, and more aligned with how production platforms are commonly built.

Instead of stretching one Kubernetes cluster across multiple clouds, the candidate will build three separate clusters—one in AWS, one in GCP, and one in Azure—and operate them as a coherent platform using GitOps, infrastructure as code, policy controls, observability, and traffic failover.

This exercise is intended to evaluate practical judgment as much as technical execution.

2. Summary #

Design and implement a multi-cloud, multi-cluster application platform spanning:

AWS: 1 Kubernetes cluster
GCP: 1 Kubernetes cluster
Azure: 1 Kubernetes cluster

A sample application must be deployed consistently to all three clusters. The platform must support:

Declarative infrastructure provisioning
GitOps-based application and platform delivery
Centralized observability
Secret management
Global traffic routing and failover
Progressive delivery or controlled rollout
Basic policy enforcement
Operational documentation

3. Scenario #

You are building the infrastructure platform for a fictional SaaS product with the following requirements:

It serves users globally.
It must continue operating if one cloud provider becomes unavailable.
Platform changes must be auditable and reproducible.
Application teams should be able to deploy safely without manually editing clusters.
Operators need visibility across all environments.
Secrets and configuration should be handled securely.
The design should be extensible toward a future internal developer platform.

You must propose and implement a platform architecture that satisfies these goals.

4. What This Exercise Is Testing #

This exercise is intended to test:

Infrastructure architecture judgment
Kubernetes operations across multiple environments
CI/CD and GitOps maturity
Cloud networking fundamentals
Traffic management and reliability design
Secrets and identity strategy
Observability design
Policy and governance
Failure handling and operational thinking
Documentation quality

It is not intended to reward unnecessary complexity, novelty for its own sake, or brittle “hero architecture.”

5. Logical Architecture #

                +----------------------+
                |      Git Repos       |
                |  infra / platform /  |
                |      application     |
                +----------+-----------+
                           |
                           v
                +----------------------+
                |   CI / Validation    |
                | lint, preview, test, |
                | policy, image build  |
                +----------+-----------+
                           |
                           v
                +-----------------------+
                | GitOps Control Layer  |
                | CD desired state      |
                | reconciliation        |
                +-----+---------------+-+
                      |               |
          ------------+---------------+-------------+
          |                           |             |
          v                           v             v
     +----------------+    +----------------+   +----------------+
     | AWS / EKS      |    | GCP / GKE      |   | Azure / AKS    |
     | app + ingress  |    | app + ingress  |   | app + ingress  |
     | metrics/logs   |    | metrics/logs   |   | metrics/logs   |
     +-------+--------+    +-------+--------+   +-------+--------+
             \                       |                  /
              \                      |                 /
               \                     |                /
                +----------------------------------------+
                | Global DNS / Traffic Steering / Health |
                +----------------------------------------+

6. Mandatory Requirements #

The solution must include all of the following.

Infrastructure as Code #

All cloud infrastructure must be provisioned declaratively or through a reproducible infrastructure-as-code workflow.

Minimum scope:

Kubernetes clusters
Networking required for cluster access
DNS or traffic infrastructure
Secret backends or secret integration
Observability infrastructure if applicable

Three Clusters #

Create one Kubernetes cluster in each of:

AWS
GCP
Azure

Managed services are acceptable and encouraged:

GitOps #

A GitOps operator must continuously reconcile at least:

namespaces
core platform components
application manifests or Helm releases

Sample Application #

Deploy a non-trivial sample application to all clusters.

Minimum:

frontend
API
health endpoints

Global Traffic Strategy #

Users must reach the application through a single public entry point.

The implementation must support one of:

weighted routing
latency-based routing
active/passive failover
health-based DNS failover

You must document:

how routing decisions work
how health is determined
how failover is tested

Observability #

Provide centralized or federated observability across clusters.

Minimum:

application metrics
infrastructure metrics
logs
distributed tracing or trace-ready instrumentation

Secrets Management #

Secrets must not be hardcoded in manifests or repos.

You must explain:

where secrets live
how they are synced into clusters
how rotation would work

Basic Policy Enforcement #

Implement at least two policy controls.

Examples:

require CPU/memory requests and limits
block privileged containers
require approved image registries
require labels/ownership metadata
block latest image tags

Reliability Demonstration #

Demonstrate at least one of:

traffic failover when one cluster is unavailable
controlled rollout / canary
rollback after a failed deployment

Documentation #

You must provide:

architecture overview
deployment instructions
repo structure explanation
operational runbook
known tradeoffs
future improvements

7. Nice-to-Have Requirements #

These are optional, but valuable.

service mesh with mTLS
workload identity / IRSA / federated identity
SLOs and alerting
per-team namespaces and RBAC model
reusable Pulumi components
reusable Helm chart or Kustomize base
Backstage integration
cost controls / autoscaling strategy
chaos testing
DR strategy notes

Solution #

Here is my final solution with all the findings and gotchas: