Serverless Containers in 2026: Fargate vs Cloud Run vs Azure Container Apps
You want the operational simplicity of serverless — no servers to manage, automatic scaling, pay-per-use pricing — but your application needs containers. Maybe it is a Java app that does not fit into a Lambda's constraints, or a Python ML model that needs specific system libraries, or simply a Docker image you have already built and tested. Serverless containers give you both worlds. Here is how the three major platforms compare in 2026, and more importantly, how to reason about which one fits your workload rather than the platform marketing pages.
What Are Serverless Containers
These platforms run your Docker images without requiring you to provision, patch, or manage the underlying infrastructure. You push a container image, configure scaling rules, and the platform handles the rest — including scaling to zero when there is no traffic. The mental shift is significant: instead of thinking in terms of always-on machines that you size for peak load, you think in terms of request concurrency and the platform allocates capacity for you.
Crucially, the unit of deployment is still a standard OCI image. That means your local Docker build, your CI pipeline, and your production runtime all share the same artifact. Consequently, the “works on my machine” class of bug largely disappears, and you keep the freedom to bundle any runtime, system library, or language version you need — something pure function-as-a-service platforms struggle with.
Traditional: Code → Docker Image → EC2/VM → Load Balancer → Users
Serverless: Code → Docker Image → Platform → Users
(scaling, networking, TLS, monitoring = handled)
Platform Comparison at a Glance
| Feature | AWS Fargate | Google Cloud Run | Azure Container Apps |
|---|---|---|---|
| Scale to zero | No (ECS) / Yes (Lambda containers) | Yes | Yes |
| Cold start | 30-60s (ECS) | 1-5s | 2-10s |
| Max timeout | Unlimited (ECS) | 60 min | 30 min |
| Max memory | 120 GB | 32 GB | 4 GB |
| Max vCPUs | 16 | 8 | 4 |
| GPU support | Yes (limited) | Yes (L4, A100) | No |
| Min instances | 1+ (ECS) | 0 | 0 |
| Pricing model | Per vCPU-second + memory | Per request + vCPU/memory | Per vCPU-second + memory |
| Free tier | None | 2M requests/month | 180K vCPU-s/month |
AWS Fargate: The Enterprise Workhorse
Fargate is AWS's serverless compute engine for containers, running on either ECS (Elastic Container Service) or EKS (Elastic Kubernetes Service). It is the most mature and feature-rich option, but also the most complex. In practice, teams adopt Fargate not because it is the simplest, but because it slots into an existing AWS estate — the same VPCs, the same IAM roles, the same observability tooling they already run.
# ECS Task Definition for Fargate
AWSTemplateFormatVersion: "2010-09-09"
Resources:
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: my-api
RequiresCompatibilities: [FARGATE]
NetworkMode: awsvpc
Cpu: "1024" # 1 vCPU
Memory: "2048" # 2 GB
ExecutionRoleArn: !GetAtt ExecutionRole.Arn
TaskRoleArn: !GetAtt TaskRole.Arn
ContainerDefinitions:
- Name: api
Image: !Sub "${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/my-api:latest"
PortMappings:
- ContainerPort: 8080
Protocol: tcp
Environment:
- Name: SPRING_PROFILES_ACTIVE
Value: production
Secrets:
- Name: DATABASE_URL
ValueFrom: !Ref DatabaseSecret
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: /ecs/my-api
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: api
HealthCheck:
Command: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
Interval: 30
Timeout: 5
Retries: 3
Service:
Type: AWS::ECS::Service
Properties:
Cluster: !Ref Cluster
ServiceName: my-api
TaskDefinition: !Ref TaskDefinition
DesiredCount: 2
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
Subnets: [!Ref PrivateSubnet1, !Ref PrivateSubnet2]
SecurityGroups: [!Ref ServiceSG]
LoadBalancers:
- ContainerName: api
ContainerPort: 8080
TargetGroupArn: !Ref TargetGroup
Notice how much surrounding infrastructure this single service implies: an execution role, a task role, two subnets, a security group, a target group, and a log group. That verbosity is the cost of Fargate's flexibility. On the upside, every one of those resources is a first-class, auditable AWS object, which is exactly what compliance-heavy organizations want.
Fargate strengths:
-
Deep AWS ecosystem integration (ALB, VPC, IAM, CloudWatch, Secrets Manager)
-
EKS support for Kubernetes-native workflows
-
Best for long-running services that need consistent capacity
-
Up to 120 GB memory and 16 vCPUs per task
Fargate weaknesses:
-
No scale-to-zero on ECS (minimum 1 task always running)
-
Complex networking setup (VPC, subnets, security groups required)
-
Slow cold starts (30–60 seconds for ECS tasks)
-
Pricing can be expensive for bursty workloads
Google Cloud Run: The Developer Favorite
Cloud Run takes the opposite approach to Fargate — radical simplicity. Deploy a container with a single command:
# Build and deploy in one step
gcloud run deploy my-api \
--source . \
--region us-central1 \
--allow-unauthenticated \
--min-instances 0 \
--max-instances 100 \
--memory 512Mi \
--cpu 1 \
--timeout 300 \
--set-env-vars "NODE_ENV=production"
That is it. Cloud Run builds the container, pushes to Artifact Registry, deploys the service, provisions HTTPS with a custom domain, and configures auto-scaling. No VPC, no load balancer configuration, no IAM roles to set up.
// A minimal Cloud Run service (any HTTP server works)
import express from "express";
const app = express();
app.use(express.json());
app.get("/api/health", (req, res) => {
res.json({ status: "healthy", region: process.env.CLOUD_RUN_REGION });
});
app.post("/api/process", async (req, res) => {
const result = await heavyComputation(req.body);
res.json(result);
});
// Cloud Run sets the PORT environment variable
const port = process.env.PORT || 8080;
app.listen(port, () => console.log(`Listening on port ${port}`));
Cloud Run strengths:
-
True scale-to-zero (zero cost when idle)
-
Fast cold starts (1–5 seconds, sub-second with min instances)
-
Simplest developer experience of all three
-
Built-in HTTPS, custom domains, traffic splitting
-
GPU support for ML inference workloads (L4 and A100)
-
Generous free tier (2 million requests/month)
Cloud Run weaknesses:
-
60-minute maximum request timeout
-
Limited to 32 GB memory, 8 vCPUs
-
No persistent volumes (stateless only)
-
Less granular networking controls compared to Fargate
Concurrency: The Setting That Changes Everything
One detail dominates both cost and latency on Cloud Run and Container Apps, yet it rarely appears in comparison tables: per-instance concurrency. On Cloud Run, a single container instance can handle many simultaneous requests — the default ceiling is 80, and you can raise it. This is fundamentally different from the one-request-per-instance model that classic FaaS uses, and it is why Cloud Run is so cost-effective for I/O-bound web services.
# Tune concurrency to match your workload profile
gcloud run deploy my-api \
--source . \
--concurrency 80 \ # I/O-bound JSON API: pack many requests per instance
--cpu 1 --memory 512Mi
# CPU-bound image resizing or ML inference: serialize work
gcloud run deploy image-worker \
--source . \
--concurrency 1 \ # one heavy request per instance avoids CPU starvation
--cpu 4 --memory 8Gi
The trade-off is concrete. A high concurrency value means fewer instances, lower cost, and fewer cold starts, but a single slow request can now contend for CPU with its neighbors. Conversely, setting concurrency to 1 gives each request a dedicated container — ideal for CPU-bound or memory-hungry tasks — at the price of more instances and a higher bill. The benchmark guidance from the docs is straightforward: profile a representative request, then set concurrency just below the point where p99 latency starts to climb.
Cold Starts and How to Tame Them
Cold starts are the most over-discussed and under-measured property of these managed platforms. The number that matters is not the platform's advertised range but your own image's initialization time. A Go binary in a scratch image typically starts in well under a second; a Spring Boot fat JAR can take ten seconds or more before it serves its first request, regardless of platform.
There are three practical levers. First, shrink the image — multi-stage builds and distroless or alpine base images reduce pull time. Second, keep a small floor of warm instances during business hours so users never pay the cold-start tax. Third, defer expensive work out of the startup path. The pattern below sets a minimum instance count, which all three platforms support in some form.
# Keep one warm instance during peak hours, scale to zero off-peak
gcloud run services update my-api \
--min-instances 1 \
--max-instances 100 \
--cpu-boost # allocate extra CPU during startup to cut cold-start time
Be honest about the economics: a single always-warm instance erodes part of the scale-to-zero savings. For a low-traffic internal tool, accepting a two-second cold start is usually the right call. For a customer-facing checkout API, a warm floor is cheap insurance against abandoned carts.
Azure Container Apps: The Kubernetes-Powered Middle Ground
Azure Container Apps (ACA) runs on Kubernetes (via KEDA and Envoy) but hides the complexity:
# Deploy with Azure CLI
az containerapp create \
--name my-api \
--resource-group my-rg \
--environment my-env \
--image myregistry.azurecr.io/my-api:latest \
--target-port 8080 \
--ingress external \
--min-replicas 0 \
--max-replicas 30 \
--cpu 1.0 \
--memory 2.0Gi \
--env-vars "ASPNETCORE_ENVIRONMENT=Production" \
--secrets "db-conn=keyvaultref:https://my-vault.vault.azure.net/secrets/db-conn"
ACA's killer feature is built-in Dapr integration for microservices patterns:
# Container App with Dapr sidecar
properties:
configuration:
dapr:
enabled: true
appId: order-service
appPort: 8080
secrets:
- name: db-connection
keyVaultUrl: https://my-vault.vault.azure.net
identity: system
ingress:
external: true
targetPort: 8080
traffic:
- latestRevision: true
weight: 80
- revisionName: order-service--v2
weight: 20 # Canary deployment
template:
scale:
minReplicas: 0
maxReplicas: 30
rules:
- name: http-scaling
http:
metadata:
concurrentRequests: "50"
- name: queue-scaling
azureQueue:
queueName: orders
queueLength: 10
auth:
- secretRef: queue-connection
triggerParameter: connection
The queue-scaling rule above is what sets ACA apart for event-driven systems. Because scaling is powered by KEDA, you are not limited to HTTP request count — you can scale on the depth of an Azure Storage Queue, a Service Bus topic, Kafka lag, or a custom Prometheus metric. A worker that drains a backlog and then scales back to zero is a one-block configuration change rather than a custom autoscaler.
ACA strengths:
-
Scale-to-zero with KEDA-based scaling (HTTP, queue, custom metrics)
-
Dapr integration for service invocation, pub/sub, state management
-
Traffic splitting for canary/blue-green deployments built in
-
Azure Key Vault integration for secrets
-
Job support for batch processing
ACA weaknesses:
-
Limited to 4 vCPUs, 4 GB memory per container (lowest of the three)
-
No GPU support
-
Younger platform with smaller community
-
Azure ecosystem lock-in
Cost Comparison: Real-World Scenarios
Scenario 1: API with steady traffic (100 req/s, 24/7)
| Platform | Monthly Cost | Notes |
|---|---|---|
| Fargate (1 vCPU, 2GB) | ~$75 | Always-on, no scale-to-zero |
| Cloud Run (1 vCPU, 512MB) | ~$45 | Scales to match traffic |
| Container Apps (1 vCPU, 2GB) | ~$55 | Scales to match traffic |
Scenario 2: Bursty API (0-500 req/s, active 8h/day)
| Platform | Monthly Cost | Notes |
|---|---|---|
| Fargate (1 vCPU, 2GB) | ~$75 | Pays full price 24/7 |
| Cloud Run (auto-scale) | ~$18 | Scales to zero overnight |
| Container Apps (auto-scale) | ~$22 | Scales to zero overnight |
Scenario 3: Background job (runs 2h/day, needs 4 vCPU, 8GB)
| Platform | Monthly Cost | Notes |
|---|---|---|
| Fargate (scheduled task) | ~$35 | EventBridge schedule + Fargate |
| Cloud Run Jobs | ~$12 | Built-in job scheduler |
| Container Apps Jobs | ~$14 | Built-in job support |
These figures are representative list-price estimates, not personal measurements; your actual bill depends on region, committed-use discounts, and egress. Still, the shape holds: Cloud Run wins on cost for bursty and intermittent workloads due to true scale-to-zero and per-request billing, while Fargate is most cost-effective for steady, always-on services where you need predictable capacity. One frequently overlooked line item is network egress — data leaving any of these platforms is billed separately and can dwarf compute cost for media-heavy or chatty services, so model it before you commit.
When NOT to Use These Platforms
Managed container platforms are not a universal answer, and choosing them reflexively leads to predictable regret. Avoid them in the following situations. First, stateful workloads — databases, message brokers, anything that wants a stable identity and a persistent local disk — fit poorly because instances are ephemeral and, on Cloud Run, have no persistent volumes at all. Second, ultra-low-latency services with strict, single-digit-millisecond p99 budgets can be undermined by cold starts and scheduling jitter; a provisioned cluster gives more predictable tail latency.
Third, very large or GPU-clustered training jobs exceed the per-instance ceilings (especially ACA's 4 vCPU / 4 GB limit) and belong on dedicated batch or Kubernetes infrastructure. Fourth, if you genuinely need fine-grained control over the kernel, sidecars, node placement, or custom networking, managed serverless abstracts away the very knobs you require — a full Kubernetes cluster is the honest choice. Finally, sustained, perfectly steady traffic often runs cheaper on reserved VMs or savings-plan-backed instances than on per-second serverless billing. The trade-off is always the same: you exchange control and, sometimes, steady-state cost for operational simplicity.
Decision Framework
Choose Fargate when:
-
You are already invested in the AWS ecosystem
-
You need more than 32 GB memory or 8 vCPUs
-
Your service must be always-on with consistent capacity
-
You need deep VPC integration and private networking
-
You want Kubernetes (EKS on Fargate)
Choose Cloud Run when:
-
You want the simplest possible deployment experience
-
Your traffic is bursty or unpredictable
-
Cost optimization is a priority (scale-to-zero)
-
You need GPU inference capabilities
-
You are a startup or small team that values speed over customization
Choose Container Apps when:
-
You need microservices patterns (Dapr, pub/sub, state management)
-
You want KEDA-based scaling on queues and custom metrics
-
You are building on Azure and want tight ecosystem integration
-
You need canary deployments and traffic splitting out of the box
For further reading, refer to the AWS documentation and the Google Cloud documentation for comprehensive reference material. If you are weighing the broader trade-offs of moving compute off dedicated servers, our companion piece on serverless architecture patterns covers the application-design side in more depth.
All three platforms have matured significantly by 2026. The best choice depends less on technical capability and more on your team's existing cloud expertise and the specific operational requirements of your workload. Pick the platform that matches your ecosystem, set concurrency and minimum instances deliberately, and you will have a production-ready deployment in hours, not weeks.
In conclusion, Serverless Containers is an essential topic for modern software development. By applying the patterns and practices covered in this guide — tuning concurrency, managing cold starts, modeling true cost including egress, and knowing when to walk away — you can build more robust, scalable, and maintainable systems. Start with the fundamentals, iterate on your implementation, and continuously measure results to ensure you are getting the most value from these approaches.