ArgoCD ApplicationSet for Multi-Cluster GitOps at Scale

ArgoCD ApplicationSet Multi-Cluster: Scaling GitOps Past the Single-Cluster Era

Running ArgoCD ApplicationSet multi-cluster deployments has been my full-time GitOps focus for the past two years, during which we grew from three clusters to fifty-two across four cloud providers and two on-prem regions. The single-cluster ArgoCD model that worked beautifully at smaller scale becomes a productivity sink once you cross roughly ten clusters, and ApplicationSet is the official answer.

However, ApplicationSet is also one of the most misunderstood ArgoCD features. Therefore, this article covers the generators that actually matter at scale, the matrix patterns that reduced our manifest count by 80%, the progressive sync strategies that kept us out of all-cluster outages, and the rollback playbook we run when a global change goes wrong at 2 AM.

Why Plain Application Resources Stop Scaling

With ten clusters and twenty applications per cluster, you are managing two hundred Application resources. Specifically, every change — a new chart version, a values tweak, a destination shift — multiplies across that grid. Consequently, teams resort to scripts, Jsonnet, or homegrown CRDs, all of which drift from the GitOps principle of declarative state in Git.

Moreover, the cognitive overhead of reasoning about which clusters got which version is brutal. As a result, postmortems frequently cite “we thought we had updated cluster X but we hadn’t” as a contributing cause. ApplicationSet replaces this with a single resource that generates Applications from parameters, evaluated continuously by the controller.

ArgoCD multi-cluster GitOps dashboard — ApplicationSet replaces hundreds of hand-written Application resources with a single generator-driven definition.

The Cluster Generator as the Foundation

The Cluster generator iterates over clusters registered with ArgoCD, optionally filtered by labels. Specifically, this is the workhorse for fleet-wide deployments — observability stacks, security policies, base RBAC. Furthermore, it integrates with the argocd.argoproj.io/secret-type=cluster Secret model, so cluster registration via your provisioning tool flows naturally into deployment scope.

Below is the pattern we use for our cluster-wide observability stack. Notably, we tag clusters with environment, region, and tier labels, and the generator filters on those:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: observability-stack
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]
  generators:
    - clusters:
        selector:
          matchLabels:
            argocd.argoproj.io/secret-type: cluster
          matchExpressions:
            - key: tier
              operator: In
              values: [production, staging]
  template:
    metadata:
      name: 'obs-{{.name}}'
      labels:
        environment: '{{index .metadata.labels "environment"}}'
        region: '{{index .metadata.labels "region"}}'
    spec:
      project: platform
      source:
        repoURL: https://github.com/acme/gitops-platform
        targetRevision: HEAD
        path: 'charts/observability'
        helm:
          valueFiles:
            - 'values/base.yaml'
            - 'values/{{index .metadata.labels "environment"}}.yaml'
            - 'values/region/{{index .metadata.labels "region"}}.yaml'
          parameters:
            - name: cluster.name
              value: '{{.name}}'
            - name: prometheus.externalLabels.cluster
              value: '{{.name}}'
      destination:
        server: '{{.server}}'
        namespace: observability
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        retry:
          limit: 5
          backoff:
            duration: 30s
            factor: 2
            maxDuration: 10m
        syncOptions:
          - CreateNamespace=true
          - ServerSideApply=true
          - PrunePropagationPolicy=foreground

This single resource produces fifty-plus Applications, one per matching cluster, each customized with environment-specific and region-specific values. Subsequently, adding a new cluster requires only registering the cluster Secret with appropriate labels — no manual Application creation.

ArgoCD ApplicationSet Multi-Cluster: Matrix Generator Patterns

The Matrix generator combines two other generators by computing the cartesian product. Therefore, when you have N clusters and M applications, Matrix produces N×M Applications from a single resource. We use this for tenant deployments where every customer namespace lands in every regional cluster:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: tenant-services
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - matrix:
        generators:
          - clusters:
              selector:
                matchLabels:
                  tier: production
          - git:
              repoURL: https://github.com/acme/tenants-config
              revision: HEAD
              files:
                - path: 'tenants/*/config.yaml'
  template:
    metadata:
      name: '{{.tenant.id}}-{{.name}}'
    spec:
      project: tenants
      source:
        repoURL: https://github.com/acme/tenants-config
        targetRevision: HEAD
        path: 'overlays/{{.tenant.tier}}'
        kustomize:
          namePrefix: '{{.tenant.id}}-'
          commonLabels:
            tenant: '{{.tenant.id}}'
            cluster: '{{.name}}'
      destination:
        server: '{{.server}}'
        namespace: 'tenant-{{.tenant.id}}'
      syncPolicy:
        automated:
          prune: true

For example, with twenty production clusters and three hundred tenants, this generates six thousand Applications. Consequently, the controller load is non-trivial and we tune --status-processors and --operation-processors on the application controller deployment to handle it.

GitOps fleet management across multiple regions — Matrix generators produce thousands of Applications from a single declarative resource.

Progressive Sync and Rollout Waves

Pushing a change to fifty clusters simultaneously is how you discover bugs at scale. Therefore, ApplicationSet supports progressive sync via the strategy: RollingSync field, which advances through ordered groups based on Application labels.

Specifically, we tag clusters as wave-1 (canary, 2 clusters), wave-2 (early production, 8 clusters), wave-3 (broad production, 30 clusters), and wave-4 (final, remainder). The strategy waits for each wave to reach Healthy and Synced before advancing. Moreover, you can configure soak times between waves so a bad change has time to manifest before it spreads:

  strategy:
    type: RollingSync
    rollingSync:
      steps:
        - matchExpressions:
            - key: wave
              operator: In
              values: ["1"]
          maxUpdate: 100%
        - matchExpressions:
            - key: wave
              operator: In
              values: ["2"]
          maxUpdate: 50%
        - matchExpressions:
            - key: wave
              operator: In
              values: ["3"]
          maxUpdate: 25%
        - matchExpressions:
            - key: wave
              operator: In
              values: ["4"]
          maxUpdate: 100%

Additionally, we layer Argo Rollouts inside each cluster for canary analysis at the workload level. As a result, a bad change has to pass intra-cluster canary analysis and then survive the cross-cluster wave soak before it reaches the full fleet. For deeper context on multi-cluster coordination, see my Cluster API multi-cluster guide and the rollout integration patterns in ECS Fargate deployment patterns.

Template Patches for Per-Cluster Customization

Sometimes you need a single field different on one cluster — a feature flag during a phased rollout, a debug log level on a problematic cluster. Specifically, ApplicationSet template patches let you override fields without forking the whole template. Furthermore, the patches are merged using strategic-merge or JSON-merge semantics depending on the resource.

In contrast to forking the manifest, this keeps the canonical template clean and isolates the override. As a result, when the override is no longer needed you remove three lines instead of reconciling a divergent fork. The official ArgoCD ApplicationSet operator manual documents the patch precedence rules.

Rollback Playbook

When a fleet-wide change goes wrong, you do not want to manually edit fifty Applications. Therefore, our playbook is: revert the offending Git commit, let ApplicationSet generators regenerate the Applications pointing at the previous revision, and watch the progressive sync waves roll the fix forward. Specifically, the same wave structure that limited blast radius on the way out limits it on the way back.

Moreover, we keep spec.syncPolicy.automated.selfHeal: true off during incident response. Subsequently, we manually sync each wave to retain control over pace. Additionally, we maintain a freeze ApplicationSet that disables automated sync fleet-wide via a single Git commit if we need to halt all GitOps activity.

In conclusion, ArgoCD ApplicationSet multi-cluster patterns are the difference between GitOps that scales and GitOps that becomes its own operational burden. Cluster and Matrix generators eliminate the manifest sprawl, progressive sync limits blast radius, and template patches handle the inevitable per-cluster exceptions without breaking the model. As a result, our fifty-cluster fleet runs with fewer operators than our three-cluster setup did three years ago, which is the clearest indicator that the approach works.