
145.9K
SAWhy Scaling Fails – Episode 2
Auto-scaling was enabled.
Peak traffic hit.
Pods didn’t scale. Why?
Short answer:
Because Kubernetes scales on signals — not traffic.
Here’s what actually happened 👇
1️⃣ HPA doesn’t monitor “traffic”
Most teams configure HPA based on:
* CPU utilization
* Memory usage
* Custom metrics (if explicitly set)
If traffic spikes but CPU doesn’t cross the threshold immediately,
HPA does nothing.
Traffic ≠ CPU instantly.
2️⃣ Metrics lag
HPA checks metrics periodically (usually ~15 seconds).
Add:
* Metrics Server delay
* Aggregation delay
By the time scaling decision happens,
your system is already stressed.
Auto-scaling is reactive.
3️⃣ Scaling takes time
Even after HPA decides to scale:
* Pod scheduled
* Container image pulled
* Application boots
* Readiness probe passes
That can take 20–60 seconds.
Peak traffic might already be causing failures.
4️⃣ Cluster had no capacity
HPA can request more pods.
But if:
* Node pool max size reached
* Resource quota exceeded
* No available CPU/memory
Pods stay in Pending state.
Scaling request ≠ scaling success.
5️⃣ Wrong metric selection
Maybe your bottleneck was:
* Database connections
* Thread pool exhaustion
* I/O wait
* External API dependency
If CPU was low,
HPA had no reason to scale.
Even if users were complaining.
🧠 Real—sight:
Auto-scaling reacts to pressure.
It doesn’t predict it.
If you expect traffic spikes:
* Pre-scale manually
* Use predictive scaling
* Scale on meaningful custom metrics
* Fix deeper bottlenecks first
Seen this during a sale or traffic spike?
Comment “HPA” 👇
Next episode:
Why scaling to 100 pods made performance worse.
#tech #systemdesign #developer #backenddeveloper
#softwareengineering
(Kubernetes auto scaling
HPA not scaling
Horizontal Pod Autoscaler
Scaling issues in Kubernetes
Peak traffic scaling
Production latency
System design interview
Distributed systems
Backend performance
DevOps debugging
Production incident)
@sanskriti_malik11










