GitHub Actions Self-Hosted Runners on Kubernetes
GitHub Actions self-hosted runners on Kubernetes give you the flexibility of self-managed CI/CD infrastructure with the convenience of GitHub Actions workflows. Instead of paying per-minute for GitHub-hosted runners or waiting in queues, you run your own runners on your Kubernetes cluster with auto-scaling, custom tooling, and full control over the execution environment.
This guide covers the complete setup using Actions Runner Controller (ARC) v2, the officially supported Kubernetes operator. You will learn how to deploy, scale, secure, and optimize self-hosted runners for production workloads, and crucially, how the controller maps GitHub’s job queue onto pods so you can reason about scaling behavior instead of treating it as a black box.
Why Self-Hosted Runners?
GitHub-hosted runners work well for simple workflows, but they have limitations that matter at scale. The most common drivers for self-hosting are cost at high CI volume, access to private network resources behind a VPC, and the need for specialized hardware like GPUs or large memory nodes that the hosted fleet does not offer.
GitHub-Hosted vs Self-Hosted Runners
┌────────────────────────┬───────────────┬───────────────────┐
│ Factor │ GitHub-Hosted │ Self-Hosted (K8s)│
├────────────────────────┼───────────────┼───────────────────┤
│ Cost (1000 min/month) │ ~$40-80 │ ~$10-20 (infra) │
│ Startup Time │ 30-90s │ 5-15s │
│ Custom Tools │ Limited │ Full control │
│ Network Access │ Public only │ Private VPC │
│ GPU Support │ Limited │ Full NVIDIA/AMD │
│ Cache Persistence │ Limited │ PVC-backed │
│ Concurrent Jobs │ Quota-limited │ Cluster capacity │
│ Security │ Ephemeral │ Configurable │
└────────────────────────┴───────────────┴───────────────────┘
These numbers are representative rather than guaranteed; your actual savings depend on whether the cluster is already running for other workloads. Notably, the startup-time advantage assumes warm minimum runners — a cold pod still pays for image pull and registration. For teams already operating Kubernetes, the marginal cost of CI capacity is genuinely low, which is what makes the economics compelling above a few thousand minutes per month.
How ARC v2 Scales Runners
It helps to understand the control loop before configuring it. ARC v2 listens to GitHub’s runner scale set webhook-style long-polling: when a workflow job is queued for your scale set, GitHub tells the listener, and the controller creates an ephemeral runner pod to claim exactly that job. When the job finishes, the pod is destroyed. This one-job-per-pod model is the key behavioral difference from the legacy ARC, and it gives you clean, reproducible runs with no state bleeding between jobs.
Because pods are ephemeral, your minRunners value trades cost against latency. Setting it to zero scales fully to zero for maximum savings but adds pod-startup latency to the first job after idle. Conversely, keeping a few warm runners eliminates that cold start at the price of always-on compute. For example, a team with bursty mid-day traffic might keep two warm runners during business hours and scale to zero overnight using a scheduled patch of the resource.
Installing Actions Runner Controller (ARC)
# Install ARC v2 using Helm
helm repo add actions-runner-controller \
https://actions-runner-controller.github.io/actions-runner-controller
helm install arc \
--namespace arc-system \
--create-namespace \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
# Create a GitHub App for authentication (recommended over PAT)
# Go to: GitHub Org Settings → Developer Settings → GitHub Apps
# Permissions needed:
# - Organization: Self-hosted runners (Read & Write)
# - Repository: Actions (Read), Metadata (Read)
# Create Kubernetes secret with GitHub App credentials
kubectl create secret generic github-app-secret \
--namespace arc-runners \
--from-literal=github_app_id=123456 \
--from-literal=github_app_installation_id=78901234 \
--from-file=github_app_private_key=./private-key.pem
Prefer a GitHub App over a personal access token. A PAT is tied to an individual, carries that person’s full scope, and breaks the moment they leave the org or rotate the token. In contrast, a GitHub App has narrowly scoped, organization-owned permissions and a far higher API rate limit, which matters because ARC polls GitHub continuously. Therefore, the App approach is both more secure and more reliable for production.
Deploying Runner Scale Sets
# runner-scale-set.yaml
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
name: k8s-runners
namespace: arc-runners
spec:
githubConfigUrl: "https://github.com/myorg"
githubConfigSecret: github-app-secret
minRunners: 2 # Always keep 2 warm runners
maxRunners: 20 # Scale up to 20 during peak
runnerGroup: "kubernetes"
template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: docker-socket
mountPath: /var/run/docker.sock
# Docker-in-Docker sidecar for container builds
- name: dind
image: docker:dind
securityContext:
privileged: true
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: docker-socket
mountPath: /var/run/docker.sock
volumes:
- name: work
emptyDir: {}
- name: docker-socket
emptyDir: {}
# Node affinity for dedicated CI nodes
nodeSelector:
workload-type: ci-runner
tolerations:
- key: "ci-runner"
operator: "Exists"
effect: "NoSchedule"
# Deploy the runner scale set
helm install k8s-runners \
--namespace arc-runners \
--create-namespace \
-f runner-scale-set.yaml \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
Two configuration choices deserve attention. The resource requests drive scheduling and the bin-packing of pods onto nodes, so set them to a realistic average; setting them too high wastes capacity, while setting them too low lets noisy builds starve their neighbors. Meanwhile, the nodeSelector and tolerations isolate CI onto dedicated nodes so a runaway build cannot evict your production services. Furthermore, pinning runners to a tainted node pool lets you use cheaper spot or preemptible instances for CI, since interrupted ephemeral jobs simply re-queue.
Custom Runner Images
# Dockerfile for custom runner with project-specific tools
FROM ghcr.io/actions/actions-runner:latest
# Install build tools
RUN sudo apt-get update && sudo apt-get install -y \
build-essential \
openjdk-21-jdk \
maven \
gradle \
nodejs \
npm \
docker-compose-plugin \
&& sudo rm -rf /var/lib/apt/lists/*
# Pre-cache common dependencies
COPY gradle-cache/ /home/runner/.gradle/
COPY maven-cache/ /home/runner/.m2/
# Install kubectl and helm for deployment workflows
RUN curl -LO "https://dl.k8s.io/release/v1.30.0/bin/linux/amd64/kubectl" \
&& sudo install kubectl /usr/local/bin/ \
&& curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
Baking tools into the image is one of the biggest wins of self-hosting. On GitHub-hosted runners every job re-downloads its toolchain, but a pre-built image starts a job with the JDK, build tools, and dependency caches already present. Consequently, build times drop and you escape flaky network failures during dependency resolution. Pre-warming the .gradle and .m2 caches in the image is especially effective for JVM projects, where cold dependency downloads often dominate short builds.
Security Hardening
# Pod security for runners
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: runner
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
capabilities:
drop: ["ALL"]
Self-hosted runners execute arbitrary code from your repositories, which makes them a genuine attack surface — especially on public repos where a malicious pull request could attempt to compromise the runner. For this reason, the official guidance is to never run self-hosted runners on public repositories without strict controls. The privileged Docker-in-Docker sidecar shown earlier is the weakest point here: it effectively grants root on the node. Therefore, prefer rootless alternatives like Kaniko or BuildKit in rootless mode for image builds, isolate runners in a dedicated namespace with NetworkPolicies, and apply the principle of least privilege through the dropped capabilities and seccomp profile above.
Caching and PersistentVolumes
Ephemeral pods lose their local cache on every run, so a naive setup can be slower than GitHub-hosted runners despite faster startup. A common pattern is to back a shared cache with a PersistentVolumeClaim or an in-cluster S3-compatible store. Additionally, GitHub’s actions/cache works with self-hosted runners, but for large monorepos teams often run a self-hosted cache proxy to keep artifacts inside the cluster network. Be careful, however: a writable cache shared across untrusted jobs is a poisoning risk, so scope shared caches to trusted internal repositories only.
Using in Workflows
# .github/workflows/build.yml
name: Build & Deploy
on: [push]
jobs:
build:
runs-on: k8s-runners # Matches the runner scale set name
steps:
- uses: actions/checkout@v4
- name: Build
run: ./gradlew build
- name: Test
run: ./gradlew test
- name: Deploy
run: kubectl apply -f k8s/
The only workflow change is the runs-on label, which must match the scale set name exactly. This makes migration incremental: you can move one workflow at a time and keep the rest on GitHub-hosted runners while you build confidence. Moreover, because the runner image already contains kubectl and cluster-internal network access, deployment steps run without the credential gymnastics that public runners require to reach a private cluster.
When NOT to Self-Host
Self-hosted runners add operational overhead. Therefore, stick with GitHub-hosted runners when your team is small (under 10 developers), you don’t need private network access, your workflows are simple and infrequent, or you lack Kubernetes expertise. The cost savings only justify the complexity above roughly 2000 CI minutes per month, and that calculation should include the engineering time to patch runner images, monitor the controller, and respond to scaling incidents.
There is also a hidden ongoing cost: security maintenance. A self-hosted fleet is your responsibility to keep patched, isolated, and free of leaked secrets, whereas GitHub-hosted runners are torn down and rebuilt fresh for every job. Consequently, unless you already operate Kubernetes with mature platform practices, the managed option is often the better business decision even at moderate volume.
Key Takeaways
GitHub Actions self-hosted runners on Kubernetes provide faster builds, lower costs, and full environment control. ARC v2’s one-job-per-pod model makes scaling predictable, custom images cut build times, and dedicated tainted nodes keep CI from threatening production. As a result, consider self-hosting when GitHub-hosted runner limitations — cost, private access, or specialized hardware — become a real bottleneck, but weigh the security and operational burden honestly before committing.
Related Reading
- GitHub Actions CI/CD Pipeline Automation
- GitHub Actions Reusable Workflows
- Kubernetes Cost Optimization
External Resources
In conclusion, Github Actions Self Hosted runners on Kubernetes are a powerful tool for teams that have outgrown the hosted fleet. By applying the patterns covered here — ARC v2 scale sets, custom images, security hardening, and disciplined caching — you can build CI that is faster and cheaper at scale. Start with a single workflow, measure the impact on build time and cost, and iterate before migrating your whole pipeline.