Serverless Containers in 2026: Fargate vs Cloud Run vs Azure Container Apps

Home › Blog › Serverless Containers in 2026: Fargate vs Cloud Run vs Azure Container Apps

Serverless Containers in 2026: Fargate vs Cloud Run vs Azure Container Apps

You want the operational simplicity of serverless — no servers to manage, automatic scaling, pay-per-use pricing — but your application needs containers. Maybe it is a Java app that does not fit into a Lambda's constraints, or a Python ML model that needs specific system libraries, or simply a Docker image you have already built and tested. Serverless containers give you both worlds. Here is how the three major platforms compare in 2026, and more importantly, how to reason about which one fits your workload rather than the platform marketing pages.

What Are Serverless Containers

These platforms run your Docker images without requiring you to provision, patch, or manage the underlying infrastructure. You push a container image, configure scaling rules, and the platform handles the rest — including scaling to zero when there is no traffic. The mental shift is significant: instead of thinking in terms of always-on machines that you size for peak load, you think in terms of request concurrency and the platform allocates capacity for you.

Crucially, the unit of deployment is still a standard OCI image. That means your local Docker build, your CI pipeline, and your production runtime all share the same artifact. Consequently, the “works on my machine” class of bug largely disappears, and you keep the freedom to bundle any runtime, system library, or language version you need — something pure function-as-a-service platforms struggle with.

Traditional:  Code → Docker Image → EC2/VM → Load Balancer → Users
Serverless:   Code → Docker Image → Platform → Users
                                   (scaling, networking, TLS, monitoring = handled)

Cloud Infrastructure

Platform Comparison at a Glance

Feature	AWS Fargate	Google Cloud Run	Azure Container Apps
Scale to zero	No (ECS) / Yes (Lambda containers)	Yes	Yes
Cold start	30-60s (ECS)	1-5s	2-10s
Max timeout	Unlimited (ECS)	60 min	30 min
Max memory	120 GB	32 GB	4 GB
Max vCPUs	16	8	4
GPU support	Yes (limited)	Yes (L4, A100)	No
Min instances	1+ (ECS)	0	0
Pricing model	Per vCPU-second + memory	Per request + vCPU/memory	Per vCPU-second + memory
Free tier	None	2M requests/month	180K vCPU-s/month

AWS Fargate: The Enterprise Workhorse

Fargate is AWS's serverless compute engine for containers, running on either ECS (Elastic Container Service) or EKS (Elastic Kubernetes Service). It is the most mature and feature-rich option, but also the most complex. In practice, teams adopt Fargate not because it is the simplest, but because it slots into an existing AWS estate — the same VPCs, the same IAM roles, the same observability tooling they already run.

# ECS Task Definition for Fargate
AWSTemplateFormatVersion: "2010-09-09"
Resources:
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: my-api
      RequiresCompatibilities: [FARGATE]
      NetworkMode: awsvpc
      Cpu: "1024"        # 1 vCPU
      Memory: "2048"     # 2 GB
      ExecutionRoleArn: !GetAtt ExecutionRole.Arn
      TaskRoleArn: !GetAtt TaskRole.Arn
      ContainerDefinitions:
        - Name: api
          Image: !Sub "${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/my-api:latest"
          PortMappings:
            - ContainerPort: 8080
              Protocol: tcp
          Environment:
            - Name: SPRING_PROFILES_ACTIVE
              Value: production
          Secrets:
            - Name: DATABASE_URL
              ValueFrom: !Ref DatabaseSecret
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: /ecs/my-api
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: api
          HealthCheck:
            Command: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
            Interval: 30
            Timeout: 5
            Retries: 3

  Service:
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref Cluster
      ServiceName: my-api
      TaskDefinition: !Ref TaskDefinition
      DesiredCount: 2
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          Subnets: [!Ref PrivateSubnet1, !Ref PrivateSubnet2]
          SecurityGroups: [!Ref ServiceSG]
      LoadBalancers:
        - ContainerName: api
          ContainerPort: 8080
          TargetGroupArn: !Ref TargetGroup

Notice how much surrounding infrastructure this single service implies: an execution role, a task role, two subnets, a security group, a target group, and a log group. That verbosity is the cost of Fargate's flexibility. On the upside, every one of those resources is a first-class, auditable AWS object, which is exactly what compliance-heavy organizations want.

Fargate strengths:

Deep AWS ecosystem integration (ALB, VPC, IAM, CloudWatch, Secrets Manager)
EKS support for Kubernetes-native workflows
Best for long-running services that need consistent capacity
Up to 120 GB memory and 16 vCPUs per task

Fargate weaknesses:

No scale-to-zero on ECS (minimum 1 task always running)
Complex networking setup (VPC, subnets, security groups required)
Slow cold starts (30–60 seconds for ECS tasks)
Pricing can be expensive for bursty workloads

Google Cloud Run: The Developer Favorite

Cloud Run takes the opposite approach to Fargate — radical simplicity. Deploy a container with a single command:

# Build and deploy in one step
gcloud run deploy my-api \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --min-instances 0 \
  --max-instances 100 \
  --memory 512Mi \
  --cpu 1 \
  --timeout 300 \
  --set-env-vars "NODE_ENV=production"

That is it. Cloud Run builds the container, pushes to Artifact Registry, deploys the service, provisions HTTPS with a custom domain, and configures auto-scaling. No VPC, no load balancer configuration, no IAM roles to set up.

// A minimal Cloud Run service (any HTTP server works)
import express from "express";

const app = express();
app.use(express.json());

app.get("/api/health", (req, res) => {
  res.json({ status: "healthy", region: process.env.CLOUD_RUN_REGION });
});

app.post("/api/process", async (req, res) => {
  const result = await heavyComputation(req.body);
  res.json(result);
});

// Cloud Run sets the PORT environment variable
const port = process.env.PORT || 8080;
app.listen(port, () => console.log(`Listening on port ${port}`));

Cloud Run strengths:

True scale-to-zero (zero cost when idle)
Fast cold starts (1–5 seconds, sub-second with min instances)
Simplest developer experience of all three
Built-in HTTPS, custom domains, traffic splitting
GPU support for ML inference workloads (L4 and A100)
Generous free tier (2 million requests/month)

Cloud Run weaknesses:

60-minute maximum request timeout
Limited to 32 GB memory, 8 vCPUs
No persistent volumes (stateless only)
Less granular networking controls compared to Fargate

Concurrency: The Setting That Changes Everything

One detail dominates both cost and latency on Cloud Run and Container Apps, yet it rarely appears in comparison tables: per-instance concurrency. On Cloud Run, a single container instance can handle many simultaneous requests — the default ceiling is 80, and you can raise it. This is fundamentally different from the one-request-per-instance model that classic FaaS uses, and it is why Cloud Run is so cost-effective for I/O-bound web services.

# Tune concurrency to match your workload profile
gcloud run deploy my-api \
  --source . \
  --concurrency 80 \   # I/O-bound JSON API: pack many requests per instance
  --cpu 1 --memory 512Mi

# CPU-bound image resizing or ML inference: serialize work
gcloud run deploy image-worker \
  --source . \
  --concurrency 1 \    # one heavy request per instance avoids CPU starvation
  --cpu 4 --memory 8Gi

The trade-off is concrete. A high concurrency value means fewer instances, lower cost, and fewer cold starts, but a single slow request can now contend for CPU with its neighbors. Conversely, setting concurrency to 1 gives each request a dedicated container — ideal for CPU-bound or memory-hungry tasks — at the price of more instances and a higher bill. The benchmark guidance from the docs is straightforward: profile a representative request, then set concurrency just below the point where p99 latency starts to climb.

Cold Starts and How to Tame Them

Cold starts are the most over-discussed and under-measured property of these managed platforms. The number that matters is not the platform's advertised range but your own image's initialization time. A Go binary in a scratch image typically starts in well under a second; a Spring Boot fat JAR can take ten seconds or more before it serves its first request, regardless of platform.

There are three practical levers. First, shrink the image — multi-stage builds and distroless or alpine base images reduce pull time. Second, keep a small floor of warm instances during business hours so users never pay the cold-start tax. Third, defer expensive work out of the startup path. The pattern below sets a minimum instance count, which all three platforms support in some form.

# Keep one warm instance during peak hours, scale to zero off-peak
gcloud run services update my-api \
  --min-instances 1 \
  --max-instances 100 \
  --cpu-boost   # allocate extra CPU during startup to cut cold-start time

Be honest about the economics: a single always-warm instance erodes part of the scale-to-zero savings. For a low-traffic internal tool, accepting a two-second cold start is usually the right call. For a customer-facing checkout API, a warm floor is cheap insurance against abandoned carts.

Azure Container Apps: The Kubernetes-Powered Middle Ground

Azure Container Apps (ACA) runs on Kubernetes (via KEDA and Envoy) but hides the complexity:

# Deploy with Azure CLI
az containerapp create \
  --name my-api \
  --resource-group my-rg \
  --environment my-env \
  --image myregistry.azurecr.io/my-api:latest \
  --target-port 8080 \
  --ingress external \
  --min-replicas 0 \
  --max-replicas 30 \
  --cpu 1.0 \
  --memory 2.0Gi \
  --env-vars "ASPNETCORE_ENVIRONMENT=Production" \
  --secrets "db-conn=keyvaultref:https://my-vault.vault.azure.net/secrets/db-conn"

ACA's killer feature is built-in Dapr integration for microservices patterns:

# Container App with Dapr sidecar
properties:
  configuration:
    dapr:
      enabled: true
      appId: order-service
      appPort: 8080
    secrets:
      - name: db-connection
        keyVaultUrl: https://my-vault.vault.azure.net
        identity: system
    ingress:
      external: true
      targetPort: 8080
      traffic:
        - latestRevision: true
          weight: 80
        - revisionName: order-service--v2
          weight: 20  # Canary deployment
  template:
    scale:
      minReplicas: 0
      maxReplicas: 30
      rules:
        - name: http-scaling
          http:
            metadata:
              concurrentRequests: "50"
        - name: queue-scaling
          azureQueue:
            queueName: orders
            queueLength: 10
            auth:
              - secretRef: queue-connection
                triggerParameter: connection

The queue-scaling rule above is what sets ACA apart for event-driven systems. Because scaling is powered by KEDA, you are not limited to HTTP request count — you can scale on the depth of an Azure Storage Queue, a Service Bus topic, Kafka lag, or a custom Prometheus metric. A worker that drains a backlog and then scales back to zero is a one-block configuration change rather than a custom autoscaler.

ACA strengths:

Scale-to-zero with KEDA-based scaling (HTTP, queue, custom metrics)
Dapr integration for service invocation, pub/sub, state management
Traffic splitting for canary/blue-green deployments built in
Azure Key Vault integration for secrets
Job support for batch processing

ACA weaknesses:

Limited to 4 vCPUs, 4 GB memory per container (lowest of the three)
No GPU support
Younger platform with smaller community
Azure ecosystem lock-in

Cost Comparison: Real-World Scenarios

Scenario 1: API with steady traffic (100 req/s, 24/7)

Platform	Monthly Cost	Notes
Fargate (1 vCPU, 2GB)	~$75	Always-on, no scale-to-zero
Cloud Run (1 vCPU, 512MB)	~$45	Scales to match traffic
Container Apps (1 vCPU, 2GB)	~$55	Scales to match traffic

Scenario 2: Bursty API (0-500 req/s, active 8h/day)

Platform	Monthly Cost	Notes
Fargate (1 vCPU, 2GB)	~$75	Pays full price 24/7
Cloud Run (auto-scale)	~$18	Scales to zero overnight
Container Apps (auto-scale)	~$22	Scales to zero overnight

Scenario 3: Background job (runs 2h/day, needs 4 vCPU, 8GB)

Platform	Monthly Cost	Notes
Fargate (scheduled task)	~$35	EventBridge schedule + Fargate
Cloud Run Jobs	~$12	Built-in job scheduler
Container Apps Jobs	~$14	Built-in job support

These figures are representative list-price estimates, not personal measurements; your actual bill depends on region, committed-use discounts, and egress. Still, the shape holds: Cloud Run wins on cost for bursty and intermittent workloads due to true scale-to-zero and per-request billing, while Fargate is most cost-effective for steady, always-on services where you need predictable capacity. One frequently overlooked line item is network egress — data leaving any of these platforms is billed separately and can dwarf compute cost for media-heavy or chatty services, so model it before you commit.

When NOT to Use These Platforms

Managed container platforms are not a universal answer, and choosing them reflexively leads to predictable regret. Avoid them in the following situations. First, stateful workloads — databases, message brokers, anything that wants a stable identity and a persistent local disk — fit poorly because instances are ephemeral and, on Cloud Run, have no persistent volumes at all. Second, ultra-low-latency services with strict, single-digit-millisecond p99 budgets can be undermined by cold starts and scheduling jitter; a provisioned cluster gives more predictable tail latency.

Third, very large or GPU-clustered training jobs exceed the per-instance ceilings (especially ACA's 4 vCPU / 4 GB limit) and belong on dedicated batch or Kubernetes infrastructure. Fourth, if you genuinely need fine-grained control over the kernel, sidecars, node placement, or custom networking, managed serverless abstracts away the very knobs you require — a full Kubernetes cluster is the honest choice. Finally, sustained, perfectly steady traffic often runs cheaper on reserved VMs or savings-plan-backed instances than on per-second serverless billing. The trade-off is always the same: you exchange control and, sometimes, steady-state cost for operational simplicity.

Decision Framework

Choose Fargate when:

You are already invested in the AWS ecosystem
You need more than 32 GB memory or 8 vCPUs
Your service must be always-on with consistent capacity
You need deep VPC integration and private networking
You want Kubernetes (EKS on Fargate)

Choose Cloud Run when:

You want the simplest possible deployment experience
Your traffic is bursty or unpredictable
Cost optimization is a priority (scale-to-zero)
You need GPU inference capabilities
You are a startup or small team that values speed over customization

Choose Container Apps when:

You need microservices patterns (Dapr, pub/sub, state management)
You want KEDA-based scaling on queues and custom metrics
You are building on Azure and want tight ecosystem integration
You need canary deployments and traffic splitting out of the box

For further reading, refer to the AWS documentation and the Google Cloud documentation for comprehensive reference material. If you are weighing the broader trade-offs of moving compute off dedicated servers, our companion piece on serverless architecture patterns covers the application-design side in more depth.

All three platforms have matured significantly by 2026. The best choice depends less on technical capability and more on your team's existing cloud expertise and the specific operational requirements of your workload. Pick the platform that matches your ecosystem, set concurrency and minimum instances deliberately, and you will have a production-ready deployment in hours, not weeks.

In conclusion, Serverless Containers is an essential topic for modern software development. By applying the patterns and practices covered in this guide — tuning concurrency, managing cold starts, modeling true cost including egress, and knowing when to walk away — you can build more robust, scalable, and maintainable systems. Start with the fundamentals, iterate on your implementation, and continuously measure results to ensure you are getting the most value from these approaches.