Pavan Rangani

HomeBlogGitHub Actions Self-Hosted Runners on Kubernetes: Complete Setup Guide

GitHub Actions Self-Hosted Runners on Kubernetes: Complete Setup Guide

By Pavan Rangani · March 26, 2026 · DevOps & Cloud

GitHub Actions Self-Hosted Runners on Kubernetes: Complete Setup Guide

GitHub Actions Self-Hosted Runners on Kubernetes

GitHub Actions self-hosted runners on Kubernetes give you the flexibility of self-managed CI/CD infrastructure with the convenience of GitHub Actions workflows. Instead of paying per-minute for GitHub-hosted runners or waiting in queues, you run your own runners on your Kubernetes cluster with auto-scaling, custom tooling, and full control over the execution environment.

This guide covers the complete setup using Actions Runner Controller (ARC) v2, the officially supported Kubernetes operator. You will learn how to deploy, scale, secure, and optimize self-hosted runners for production workloads, and crucially, how the controller maps GitHub’s job queue onto pods so you can reason about scaling behavior instead of treating it as a black box.

Why Self-Hosted Runners?

GitHub-hosted runners work well for simple workflows, but they have limitations that matter at scale. The most common drivers for self-hosting are cost at high CI volume, access to private network resources behind a VPC, and the need for specialized hardware like GPUs or large memory nodes that the hosted fleet does not offer.

GitHub-Hosted vs Self-Hosted Runners

┌────────────────────────┬───────────────┬───────────────────┐
│ Factor                 │ GitHub-Hosted │ Self-Hosted (K8s)│
├────────────────────────┼───────────────┼───────────────────┤
│ Cost (1000 min/month)  │ ~$40-80       │ ~$10-20 (infra)  │
│ Startup Time           │ 30-90s        │ 5-15s            │
│ Custom Tools           │ Limited       │ Full control     │
│ Network Access         │ Public only   │ Private VPC      │
│ GPU Support            │ Limited       │ Full NVIDIA/AMD  │
│ Cache Persistence      │ Limited       │ PVC-backed       │
│ Concurrent Jobs        │ Quota-limited │ Cluster capacity │
│ Security               │ Ephemeral     │ Configurable     │
└────────────────────────┴───────────────┴───────────────────┘

These numbers are representative rather than guaranteed; your actual savings depend on whether the cluster is already running for other workloads. Notably, the startup-time advantage assumes warm minimum runners — a cold pod still pays for image pull and registration. For teams already operating Kubernetes, the marginal cost of CI capacity is genuinely low, which is what makes the economics compelling above a few thousand minutes per month.

GitHub Actions self-hosted runners Kubernetes infrastructure
Self-hosted runners on Kubernetes with auto-scaling and custom images

How ARC v2 Scales Runners

It helps to understand the control loop before configuring it. ARC v2 listens to GitHub’s runner scale set webhook-style long-polling: when a workflow job is queued for your scale set, GitHub tells the listener, and the controller creates an ephemeral runner pod to claim exactly that job. When the job finishes, the pod is destroyed. This one-job-per-pod model is the key behavioral difference from the legacy ARC, and it gives you clean, reproducible runs with no state bleeding between jobs.

Because pods are ephemeral, your minRunners value trades cost against latency. Setting it to zero scales fully to zero for maximum savings but adds pod-startup latency to the first job after idle. Conversely, keeping a few warm runners eliminates that cold start at the price of always-on compute. For example, a team with bursty mid-day traffic might keep two warm runners during business hours and scale to zero overnight using a scheduled patch of the resource.

Installing Actions Runner Controller (ARC)

# Install ARC v2 using Helm
helm repo add actions-runner-controller \
  https://actions-runner-controller.github.io/actions-runner-controller

helm install arc \
  --namespace arc-system \
  --create-namespace \
  oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

# Create a GitHub App for authentication (recommended over PAT)
# Go to: GitHub Org Settings → Developer Settings → GitHub Apps
# Permissions needed:
#   - Organization: Self-hosted runners (Read & Write)
#   - Repository: Actions (Read), Metadata (Read)

# Create Kubernetes secret with GitHub App credentials
kubectl create secret generic github-app-secret \
  --namespace arc-runners \
  --from-literal=github_app_id=123456 \
  --from-literal=github_app_installation_id=78901234 \
  --from-file=github_app_private_key=./private-key.pem

Prefer a GitHub App over a personal access token. A PAT is tied to an individual, carries that person’s full scope, and breaks the moment they leave the org or rotate the token. In contrast, a GitHub App has narrowly scoped, organization-owned permissions and a far higher API rate limit, which matters because ARC polls GitHub continuously. Therefore, the App approach is both more secure and more reliable for production.

Deploying Runner Scale Sets

# runner-scale-set.yaml
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  name: k8s-runners
  namespace: arc-runners
spec:
  githubConfigUrl: "https://github.com/myorg"
  githubConfigSecret: github-app-secret
  minRunners: 2        # Always keep 2 warm runners
  maxRunners: 20       # Scale up to 20 during peak
  runnerGroup: "kubernetes"

  template:
    spec:
      containers:
        - name: runner
          image: ghcr.io/actions/actions-runner:latest
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
            limits:
              cpu: "4"
              memory: "8Gi"
          volumeMounts:
            - name: work
              mountPath: /home/runner/_work
            - name: docker-socket
              mountPath: /var/run/docker.sock

        # Docker-in-Docker sidecar for container builds
        - name: dind
          image: docker:dind
          securityContext:
            privileged: true
          volumeMounts:
            - name: work
              mountPath: /home/runner/_work
            - name: docker-socket
              mountPath: /var/run/docker.sock

      volumes:
        - name: work
          emptyDir: {}
        - name: docker-socket
          emptyDir: {}

      # Node affinity for dedicated CI nodes
      nodeSelector:
        workload-type: ci-runner
      tolerations:
        - key: "ci-runner"
          operator: "Exists"
          effect: "NoSchedule"
# Deploy the runner scale set
helm install k8s-runners \
  --namespace arc-runners \
  --create-namespace \
  -f runner-scale-set.yaml \
  oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Two configuration choices deserve attention. The resource requests drive scheduling and the bin-packing of pods onto nodes, so set them to a realistic average; setting them too high wastes capacity, while setting them too low lets noisy builds starve their neighbors. Meanwhile, the nodeSelector and tolerations isolate CI onto dedicated nodes so a runaway build cannot evict your production services. Furthermore, pinning runners to a tainted node pool lets you use cheaper spot or preemptible instances for CI, since interrupted ephemeral jobs simply re-queue.

Kubernetes cluster running CI/CD runners
Runner pods auto-scaling based on queued GitHub Actions jobs

Custom Runner Images

# Dockerfile for custom runner with project-specific tools
FROM ghcr.io/actions/actions-runner:latest

# Install build tools
RUN sudo apt-get update && sudo apt-get install -y \
    build-essential \
    openjdk-21-jdk \
    maven \
    gradle \
    nodejs \
    npm \
    docker-compose-plugin \
    && sudo rm -rf /var/lib/apt/lists/*

# Pre-cache common dependencies
COPY gradle-cache/ /home/runner/.gradle/
COPY maven-cache/ /home/runner/.m2/

# Install kubectl and helm for deployment workflows
RUN curl -LO "https://dl.k8s.io/release/v1.30.0/bin/linux/amd64/kubectl" \
    && sudo install kubectl /usr/local/bin/ \
    && curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Baking tools into the image is one of the biggest wins of self-hosting. On GitHub-hosted runners every job re-downloads its toolchain, but a pre-built image starts a job with the JDK, build tools, and dependency caches already present. Consequently, build times drop and you escape flaky network failures during dependency resolution. Pre-warming the .gradle and .m2 caches in the image is especially effective for JVM projects, where cold dependency downloads often dominate short builds.

Security Hardening

# Pod security for runners
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        fsGroup: 1001
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: runner
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: false
            capabilities:
              drop: ["ALL"]

Self-hosted runners execute arbitrary code from your repositories, which makes them a genuine attack surface — especially on public repos where a malicious pull request could attempt to compromise the runner. For this reason, the official guidance is to never run self-hosted runners on public repositories without strict controls. The privileged Docker-in-Docker sidecar shown earlier is the weakest point here: it effectively grants root on the node. Therefore, prefer rootless alternatives like Kaniko or BuildKit in rootless mode for image builds, isolate runners in a dedicated namespace with NetworkPolicies, and apply the principle of least privilege through the dropped capabilities and seccomp profile above.

Caching and PersistentVolumes

Ephemeral pods lose their local cache on every run, so a naive setup can be slower than GitHub-hosted runners despite faster startup. A common pattern is to back a shared cache with a PersistentVolumeClaim or an in-cluster S3-compatible store. Additionally, GitHub’s actions/cache works with self-hosted runners, but for large monorepos teams often run a self-hosted cache proxy to keep artifacts inside the cluster network. Be careful, however: a writable cache shared across untrusted jobs is a poisoning risk, so scope shared caches to trusted internal repositories only.

Using in Workflows

# .github/workflows/build.yml
name: Build & Deploy
on: [push]
jobs:
  build:
    runs-on: k8s-runners  # Matches the runner scale set name
    steps:
      - uses: actions/checkout@v4
      - name: Build
        run: ./gradlew build
      - name: Test
        run: ./gradlew test
      - name: Deploy
        run: kubectl apply -f k8s/

The only workflow change is the runs-on label, which must match the scale set name exactly. This makes migration incremental: you can move one workflow at a time and keep the rest on GitHub-hosted runners while you build confidence. Moreover, because the runner image already contains kubectl and cluster-internal network access, deployment steps run without the credential gymnastics that public runners require to reach a private cluster.

CI/CD pipeline monitoring dashboard
Monitoring runner utilization and workflow performance

When NOT to Self-Host

Self-hosted runners add operational overhead. Therefore, stick with GitHub-hosted runners when your team is small (under 10 developers), you don’t need private network access, your workflows are simple and infrequent, or you lack Kubernetes expertise. The cost savings only justify the complexity above roughly 2000 CI minutes per month, and that calculation should include the engineering time to patch runner images, monitor the controller, and respond to scaling incidents.

There is also a hidden ongoing cost: security maintenance. A self-hosted fleet is your responsibility to keep patched, isolated, and free of leaked secrets, whereas GitHub-hosted runners are torn down and rebuilt fresh for every job. Consequently, unless you already operate Kubernetes with mature platform practices, the managed option is often the better business decision even at moderate volume.

Key Takeaways

GitHub Actions self-hosted runners on Kubernetes provide faster builds, lower costs, and full environment control. ARC v2’s one-job-per-pod model makes scaling predictable, custom images cut build times, and dedicated tainted nodes keep CI from threatening production. As a result, consider self-hosting when GitHub-hosted runner limitations — cost, private access, or specialized hardware — become a real bottleneck, but weigh the security and operational burden honestly before committing.

Related Reading

External Resources

In conclusion, Github Actions Self Hosted runners on Kubernetes are a powerful tool for teams that have outgrown the hosted fleet. By applying the patterns covered here — ARC v2 scale sets, custom images, security hardening, and disciplined caching — you can build CI that is faster and cheaper at scale. Start with a single workflow, measure the impact on build time and cost, and iterate before migrating your whole pipeline.

← Back to all articles