Kubernetes
Kubernetes Auto and Scheduled Scaling
Efficiently scale Kubernetes for a robust, cost-effective cluster management
Kubernetes autoscaling helps clusters respond to changes in demand while reducing over-provisioning. When it is paired with scheduled scaling for predictable peaks, it can also reduce cost without sacrificing responsiveness.
Effective scaling in Kubernetes usually requires coordination across two layers:
- the pod layer, where the Horizontal Pod Autoscaler (HPA) adjusts replicas
- the node layer, where the Cluster Autoscaler (CA) adjusts available compute
Kubernetes Autoscaling Overview
Kubernetes supports a few different scaling mechanisms:
- Horizontal Pod Autoscaler (HPA) to scale replicas
- Cluster Autoscaler (CA) to scale nodes
- Vertical Pod Autoscaler (VPA) to adjust requested resources over time
Each one solves a different part of the scaling problem.
Autoscaling Pods with HPA
The HPA scales workloads based on resource usage or other metrics. A common starting point is CPU utilisation:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-example
spec:
replicas: 1
selector:
matchLabels:
run: hpa-example
template:
metadata:
labels:
run: hpa-example
spec:
containers:
- name: hpa-example
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-example
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-example
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
You can create the same HPA imperatively:
kubectl autoscale deployment hpa-example --min=1 --max=10 --cpu-percent=50
If you see errors such as FailedGetResourceMetric, make sure metrics-server is installed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
To test HPA behaviour, generate load against the deployment:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-example; done"
Scaling Nodes with Cluster Autoscaler
The Cluster Autoscaler adds nodes when pods cannot be scheduled because of resource constraints and removes idle nodes after a period of inactivity.
Key behaviours include:
- scaling up when pods are unschedulable
- scaling down nodes that are no longer needed
- respecting the minimum and maximum size of the backing node group
On AWS, Cluster Autoscaler can discover Auto Scaling Groups by tag or be configured manually.
Installation with Helm
helm repo add autoscaler https://kubernetes.github.io/autoscaler
Autodiscovery is often the cleanest route:
helm install autoscaler autoscaler/cluster-autoscaler \
--set awsRegion=eu-west-1 \
--set "autoDiscovery.clusterName=<Cluster Name>" \
--set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::<AWS Account ID>:role/<CA Role>
Scheduled Scaling
Scheduled scaling complements autoscaling by preparing for predictable peaks in demand.
Pod-Level Scheduling
For one-off changes you can scale a deployment directly:
kubectl scale --replicas=5 deployment/hpa-example
For recurring schedules, you can use a controller such as kube-schedule-scaler or a CronJob that applies the scale command:
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up
spec:
schedule: '0 8 * * *'
jobTemplate:
spec:
template:
spec:
containers:
- name: scale
image: bitnami/kubectl
command:
- /bin/sh
- -c
- kubectl scale --replicas=3 deployment/nginx
restartPolicy: OnFailure
Node-Level Scheduling
For AWS-hosted clusters, you can also apply scheduled scaling policies to the underlying Auto Scaling Groups and still allow Cluster Autoscaler to manage day-to-day elasticity around those schedules.
Conclusion
Kubernetes offers strong native primitives for scaling, but the best results usually come from combining them. HPA handles changing workload demand, Cluster Autoscaler handles cluster capacity, and scheduled scaling prepares the platform for predictable peaks. Together, they create a more efficient and resilient operating model.