Back to blog

Kubernetes

Kubernetes Auto and Scheduled Scaling

Efficiently scale Kubernetes for a robust, cost-effective cluster management

January 1, 2020 Platform Engineering 4 min read

Kubernetes autoscaling helps clusters respond to changes in demand while reducing over-provisioning. When it is paired with scheduled scaling for predictable peaks, it can also reduce cost without sacrificing responsiveness.

Effective scaling in Kubernetes usually requires coordination across two layers:

  • the pod layer, where the Horizontal Pod Autoscaler (HPA) adjusts replicas
  • the node layer, where the Cluster Autoscaler (CA) adjusts available compute

Kubernetes Autoscaling Overview

Kubernetes supports a few different scaling mechanisms:

  • Horizontal Pod Autoscaler (HPA) to scale replicas
  • Cluster Autoscaler (CA) to scale nodes
  • Vertical Pod Autoscaler (VPA) to adjust requested resources over time

Each one solves a different part of the scaling problem.

Autoscaling Pods with HPA

The HPA scales workloads based on resource usage or other metrics. A common starting point is CPU utilisation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hpa-example
spec:
  replicas: 1
  selector:
    matchLabels:
      run: hpa-example
  template:
    metadata:
      labels:
        run: hpa-example
    spec:
      containers:
        - name: hpa-example
          image: k8s.gcr.io/hpa-example
          ports:
            - containerPort: 80
          resources:
            limits:
              cpu: 500m
            requests:
              cpu: 200m
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-example
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-example
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

You can create the same HPA imperatively:

kubectl autoscale deployment hpa-example --min=1 --max=10 --cpu-percent=50

If you see errors such as FailedGetResourceMetric, make sure metrics-server is installed:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

To test HPA behaviour, generate load against the deployment:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-example; done"

Scaling Nodes with Cluster Autoscaler

The Cluster Autoscaler adds nodes when pods cannot be scheduled because of resource constraints and removes idle nodes after a period of inactivity.

Key behaviours include:

  • scaling up when pods are unschedulable
  • scaling down nodes that are no longer needed
  • respecting the minimum and maximum size of the backing node group

On AWS, Cluster Autoscaler can discover Auto Scaling Groups by tag or be configured manually.

Installation with Helm

helm repo add autoscaler https://kubernetes.github.io/autoscaler

Autodiscovery is often the cleanest route:

helm install autoscaler autoscaler/cluster-autoscaler \
  --set awsRegion=eu-west-1 \
  --set "autoDiscovery.clusterName=<Cluster Name>" \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::<AWS Account ID>:role/<CA Role>

Scheduled Scaling

Scheduled scaling complements autoscaling by preparing for predictable peaks in demand.

Pod-Level Scheduling

For one-off changes you can scale a deployment directly:

kubectl scale --replicas=5 deployment/hpa-example

For recurring schedules, you can use a controller such as kube-schedule-scaler or a CronJob that applies the scale command:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-up
spec:
  schedule: '0 8 * * *'
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: scale
              image: bitnami/kubectl
              command:
                - /bin/sh
                - -c
                - kubectl scale --replicas=3 deployment/nginx
          restartPolicy: OnFailure

Node-Level Scheduling

For AWS-hosted clusters, you can also apply scheduled scaling policies to the underlying Auto Scaling Groups and still allow Cluster Autoscaler to manage day-to-day elasticity around those schedules.

Conclusion

Kubernetes offers strong native primitives for scaling, but the best results usually come from combining them. HPA handles changing workload demand, Cluster Autoscaler handles cluster capacity, and scheduled scaling prepares the platform for predictable peaks. Together, they create a more efficient and resilient operating model.