Troubleshooting
Kubernetes ImagePullBackOff and Pending Pod Troubleshooting
Complete guide to diagnosing and fixing the most common pod deployment failures
When pods get stuck in Pending or fail with ImagePullBackOff, deployments stop moving and teams lose time quickly. These are some of the most common Kubernetes failure states, and they usually point to a small set of root causes.
This guide walks through how to identify which failure mode you are dealing with, how to debug it, and what to put in place to reduce the chance of it happening again.
Quick Diagnosis: ImagePullBackOff vs Pending
The first step is to work out whether Kubernetes is failing to pull the container image or failing to schedule the pod onto a node.
ImagePullBackOff
Image pull failures show up like this:
kubectl get pods
NAME READY STATUS RESTARTS AGE
webapp-7d4f8b6c5d-xyz 0/1 ImagePullBackOff 0 2m
Then inspect the pod:
kubectl describe pod webapp-7d4f8b6c5d-xyz
Look for registry authentication problems, invalid image references, missing tags, or network failures.
Pending Pods
Pending pods are different. They never get to the image pull stage because the scheduler cannot place them:
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE NODE
database-abc123-def 0/1 Pending 0 5m <none>
If the NODE column shows <none>, you are dealing with a scheduling issue rather than an image pull issue.
Step-by-Step ImagePullBackOff Troubleshooting
Check Image Names and Tags
Start by confirming the image reference in the workload:
kubectl get deployment webapp -o yaml | grep image:
Common mistakes include:
- typos in repository or image names
- missing registry prefixes for private images
- non-existent tags
- architecture mismatches between the image and the node
Configure ImagePullSecrets for Private Registries
For private registries, make sure imagePullSecrets is present and correct:
kubectl create secret docker-registry regcred \
--docker-server=your-registry.com \
--docker-username=your-username \
--docker-password=your-password \
--docker-email=your-email@domain.com
Then reference it in the pod spec:
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
imagePullSecrets:
- name: regcred
containers:
- name: webapp
image: your-registry.com/webapp:latest
Check Registry Reachability
If the image reference and secret are correct, inspect events for DNS or network issues:
kubectl get events --sort-by=.metadata.creationTimestamp
That often reveals whether the issue is authentication, DNS resolution, or registry connectivity.
Resolving Pending Pod Issues
Pending pods usually point to capacity, scheduling constraints, or policy mismatches.
Resource Pressure
Check node capacity first:
kubectl top nodes
kubectl describe nodes
Then inspect the pending pod:
kubectl describe pod pending-pod-name
Look for FailedScheduling events that mention insufficient CPU, memory, or storage.
Node Affinity, Taints, and Tolerations
If resources look available, check whether the pod has affinity rules, node selectors, or missing tolerations:
kubectl get nodes --show-labels
kubectl describe node node-name
kubectl get pod pending-pod -o yaml | grep -A 10 affinity
If needed, add tolerations like:
spec:
tolerations:
- key: 'node-type'
operator: 'Equal'
value: 'compute'
effect: 'NoSchedule'
The main point is to verify that at least one node in the cluster actually matches the pod’s scheduling requirements.
Prevention Best Practices
Image Management
- use specific image tags instead of
latest - scan images before deployment
- validate private registry access before rollout
- keep image architecture aligned to node architecture
Resource Planning
- set realistic requests and limits
- monitor node capacity trends
- use resource quotas where appropriate
- right-size workloads before the cluster is under pressure
Cluster Health Monitoring
Keep an eye on events, pod health, and cluster resource use:
kubectl get events --watch
kubectl top pods --all-namespaces
That makes it easier to catch scheduling and image issues before they turn into a larger outage.
Conclusion
ImagePullBackOff and Pending are different problems, but both become much easier to solve once you identify which layer is failing. Image pull issues usually come down to image references, credentials, or registry connectivity. Pending pods usually point to resource constraints or scheduling rules. A quick, structured diagnosis path saves a lot of time when the cluster starts pushing back.