Cloud Run: min-instances 0 & CPU Throttling Safe?
Is it safe to use min-instances: 0 with --no-cpu-throttling on Google Cloud Run? Understand cold starts, cost savings, and operational recommendations.
Is it safe to set min-instances: 0 with --no-cpu-throttling for a Google Cloud Run background sync worker handling sporadic workloads?
Current Setup:
- Service: Background sync worker (Meta Ads API + file downloads)
- CPU: 4 cores
- Memory: 8Gi
- Flag: --no-cpu-throttling (always-on CPU allocation)
- min-instances: 1
- max-instances: 10
Traffic Pattern:
- Sporadic bursts triggered manually or on schedule
- Long idle periods (hours to days)
- Active jobs run 5-30 minutes (API calls + creative downloads)
Resource Utilization (Cloud Run metrics):
- CPU: 10-30% average, peaks 40-60%
- Memory: 5-10% average (~800MB-1.6GB of 8Gi)
- Instances: Usually 1, scales to 2-4 during heavy syncs
Key Concern: Need --no-cpu-throttling for performance, but paying for idle time with min-instances: 1. Can I switch to min-instances: 0 to scale to zero during idle periods, expect 4-6 second cold starts, retain full CPU once running, and save 70-80% on costs?
Yes — for a sporadic background sync worker it is generally safe to set min-instances: 0 with --no-cpu-throttling on Cloud Run if you can tolerate occasional cold starts and ensure jobs are triggered by HTTP/scheduler. With min-instances: 0 Cloud Run will scale to zero during long idle periods (so you stop paying idle CPU) and --no-cpu-throttling gives the container full vCPU once it starts; expect typical cold starts in the ~4–6s range and use startup CPU boost, wake‑up requests or right‑sizing to reduce impact.
Contents
- Cloud Run + --no-cpu-throttling: what changes with min-instances:0
- Cold start impact and mitigations
- Cost tradeoffs: expected savings vs runtime cost
- Operational recommendations & safe migration steps
- When to keep min-instances:1 (or higher)
- Alternative patterns to consider
- Sources
- Conclusion
Cloud Run + --no-cpu-throttling: what changes with min-instances:0
Setting min-instances: 0 lets Cloud Run scale your service to zero when no requests arrive; scaling from zero is triggered only by an incoming request, so the service will not consume CPU while scaled-to-zero. The min-instances documentation explains how minimum instances control whether Cloud Run keeps containers warm, and the autoscaling doc clarifies that scaling-from-zero requires a request.
--no-cpu-throttling (or the run.googleapis.com/cpu-throttling: "false" annotation) removes request-only CPU throttling so the instance receives full vCPU for its lifecycle while it exists. That is exactly why you need --no-cpu-throttling for CPU-heavy background work: the worker can continue compute between requests or during long request processing. The CPU configuration page documents startup CPU boost and how CPU allocation behaves.
Crucial combo: with min-instances: 0 + --no-cpu-throttling, you get zero idle-cost during long quiet periods (no instances running), and when Cloud Run starts an instance you get full CPU for the job. But you will pay for the full runtime (vCPU-seconds and memory-seconds) while the instance is alive. The billing settings page describes how CPU billing and instance lifecycle interact.
Example deploy/update command:
gcloud run deploy my-sync-worker \ --image gcr.io/PROJECT/my-sync-worker:tag \ --region us-central1 \ --platform managed \ --min-instances=0 \ --max-instances=10 \ --concurrency=1 \ --no-cpu-throttling
Or set the annotation in a service YAML:
metadata:
annotations:
run.googleapis.com/cpu-throttling: "false"
Cold start impact and mitigations
What is a cold start for Cloud Run? Cold start time is the elapsed time to schedule the instance, pull the image (if needed), start the container, and let your app initialize. A common breakdown for cold-start components is container scheduling, image fetch, runtime init and application init — total typical times fall in the 2–8s range, and many real-world cases report ~4–6s for non-trivial apps. See a practical breakdown and optimization checklist in the cold-start guide and optimization posts: cold-start optimization summary and 3 ways to optimize Cloud Run response times.
Mitigations you can and should use:
- Enable startup CPU boost where appropriate to shorten runtime startup phases; the CPU doc covers that option.
- Pre-warm using a scheduled wake-up request: have Cloud Scheduler or another orchestrator call a lightweight endpoint (e.g.,
/healthor/_warm) 30–90s before the sync job starts to give the instance time to initialize. The autoscaling docs explicitly recommend a wake-up request for workloads that cannot tolerate scale-from-zero delay. - Reduce application init time: lazy-load heavy libraries, defer non-essential I/O, cache credentials, and avoid blocking network calls on startup.
- Shrink your image and runtime dependencies: smaller images and simpler init logic reduce image-pull and startup time.
- Tune concurrency: for long CPU‑bound sync jobs set
--concurrency=1so a single job gets the full vCPU rather than competing with other in-flight requests. If your job is I/O bound and safe to parallelize, higher concurrency reduces instance count. - Add retries with backoff for the trigger path (so a transient failure or slow cold-start won’t cause job loss).
If your scheduled job must start at an exact second (no tolerance), pre-warm or keep a small min-instances value because scale-from-zero can add unpredictable milliseconds-to-seconds of extra latency.
Cost tradeoffs: expected savings vs runtime cost
Switching min-instances to zero eliminates the cost of an always-warm instance between runs. The Cloud Run pricing page shows you pay for vCPU-seconds and memory-seconds while instances exist; when scaled-to-zero you pay nothing for CPU/memory. Practical cost analyses for background workers show large savings — for very-infrequent workloads one guide reports ~70–80% savings by switching min-instances: 0 while using --no-cpu-throttling so the worker gets full CPU only while running (see the pricing walkthrough at Pump).
Tradeoffs you must evaluate:
- Cold-start latency (time cost) vs. idle-instance charges (money cost). For your workload (runs lasting 5–30 minutes, hours/days idle), the overhead of a 4–6s cold start is a small fraction of job runtime, so cost savings usually outweigh the latency impact.
- Active-run cost:
--no-cpu-throttlingwon’t increase charges while scaled to zero, but while the instance is running you are billed for the full vCPU allocation. Right‑sizing vCPU (see below) can reduce active-run cost.
Right-sizing suggestion: your metrics (10–30% average CPU, peaks 40–60% on 4 vCPU) suggest 4 vCPU might be over-provisioned for most runs. Because Cloud Run requires integer vCPUs for values >1, try 2 vCPU + --no-cpu-throttling and --concurrency=1 in a staging test to measure job run time. If runtime stays within acceptable bounds, you cut active cost substantially while preserving performance.
Operational recommendations & safe migration steps
- Start in staging. Deploy an identical service with
min-instances: 0and--no-cpu-throttlingand run a representative set of sync jobs. Measure cold start (time until first meaningful work), p50/p95 job durations, and error/timeout rates. - Measure cold-start distribution. Log and track the time from trigger to first work using Cloud Logging and Cloud Monitoring. If p95 cold start > your SLA, add mitigations (pre-warm, startup CPU boost, or keep a small
min-instances). - Implement a wake-up request for scheduled runs. For scheduled syncs, use Cloud Scheduler to POST a lightweight request to a warm endpoint 30–90s before the heavy job starts. This is cheaper than an always-on instance and reliable because scaling-from-zero is request-driven (see about instance autoscaling).
- Right-size CPU/memory. Do a controlled experiment at 2 vCPU and 8Gi memory (or lower memory if underused). If your worker is single-threaded, reducing to 1–2 vCPU will likely be cost-effective. The CPU docs note fractional CPU options <1 vCPU and integer requirement >1 vCPU.
- Tune concurrency. Set
--concurrency=1for CPU-bound jobs to ensure full CPU per job; increase concurrency only if jobs are I/O bound and safely parallelizable. - Add monitoring & alerts. Track cold start counts, startup time, job-duration SLO, 5xx errors and retry rates. Create an alert to flip
min-instancesback to 1 if failure/latency spikes. - Rollback plan. If production shows unacceptable latency or failures after migration, revert to
min-instances: 1while you refine warm-up and right-sizing strategies.
Example safe rollout plan:
- Week 1 (staging): test with
min-instances: 0. - Week 2 (canary prod): route 5–10% jobs to the new service; measure.
- Week 3 (full prod): move all jobs and keep monitoring for 1–2 weeks.
When to keep min-instances:1 (or higher)
Keep min-instances: 1 (or >1) if any of these are true:
- You need deterministic, near-zero start latency (sub-second) for scheduled jobs or user-facing workflows.
- Jobs are frequent (every few minutes) so repeated cold starts would cost more in latency and retry overhead than a warm instance.
- Your workload maintains in-memory state, long-lived connections, or external sessions that cannot be re-established quickly.
- You rely on precise timing (e.g., financial cutoffs) where a 4–6s variance is unacceptable.
If any of these apply, a small min-instances (1 or a low number) is simpler and more reliable.
Alternative patterns to consider
- Cloud Run Jobs: if your background syncs are truly batch jobs and you prefer a job model (run-to-completion without an HTTP request lifecycle), evaluate Cloud Run Jobs or other batch services.
- Pub/Sub + Cloud Run or Cloud Tasks: trigger jobs via push messages so triggers are reliable and retryable.
- Small always-on VM or managed instance group: when you need absolute start-determinism and you have many short tasks, a low-cost VM might be cheaper and simpler.
Sources
- https://docs.cloud.google.com/run/docs/configuring/min-instances
- https://cloud.google.com/run/docs/configuring/billing-settings
- https://docs.cloud.google.com/run/docs/configuring/services/cpu
- https://docs.cloud.google.com/run/docs/about-instance-autoscaling
- https://cloud.google.com/run/docs/tips/general
- https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation
- https://cloud.google.com/blog/topics/developers-practitioners/3-ways-optimize-cloud-run-response-times
- https://deployu.ai/interviews/gcp-interview/fix-cloud-run-cold-starts
- https://www.pump.co/blog/google-cloud-run-pricing
- https://cloud.google.com/run/pricing
Conclusion
For your background sync worker (5–30 minute jobs, long idle windows), switching to min-instances: 0 with --no-cpu-throttling on Cloud Run is a practical and cost-effective choice: you avoid idle CPU charges while preserving full vCPU during runs. Expect ~4–6s cold starts; mitigate with startup CPU boost, wake-up requests, and right‑sizing (e.g., test 2 vCPU + concurrency=1). If you need deterministic start times or very low latency, keep a small min-instances instead.