When your Kubernetes pod enters a CrashLoopBackOff
state, it means the container starts but then crashes repeatedly. Kubernetes then implements an exponential backoff delay between restart attempts. From your description, we can see the pod has restarted 72 times in 5 hours, which indicates a serious issue.
Before diving deep, let's check the basic troubleshooting steps you should always perform:
# Get pod details
kubectl describe pod quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4
# Check container logs (even if they seem empty)
kubectl logs quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4 --previous
# Check events at cluster level
kubectl get events --sort-by='.metadata.creationTimestamp'
From your event logs, we can see the pattern:
- Container creates successfully
- Container starts successfully
- Then crashes shortly after
- Kubernetes attempts to restart with increasing delays
The key observation here is that the container starts but then exits. This typically indicates one of several common issues:
1. Application Crashes Immediately
Your application might be throwing an uncaught exception or failing some startup check. Try:
# Run the container locally in debug mode
docker run -it --entrypoint=/bin/sh us.gcr.io/skywatch-app/quasar-api-staging:15.0
# Then manually start your application to see errors
2. Missing Dependencies or Configuration
The pod might be missing:
- Environment variables
- ConfigMaps or Secrets
- Volume mounts
Check your deployment YAML for these requirements:
# Example of checking environment variables
kubectl set env deployment/quasar-api-staging --list
3. Resource Constraints
Your container might be getting OOMKilled. Check:
kubectl describe pod quasar-api-staging-... | grep -i "oom"
Using Ephemeral Containers for Debugging
Kubernetes 1.18+ allows adding debug containers to running pods:
kubectl debug -it quasar-api-staging-... --image=busybox --target=quasar-api-staging
Checking Container Exit Codes
The exit code can reveal why your application failed:
kubectl get pod quasar-api-staging-... -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
To avoid future CrashLoopBackOff scenarios:
- Implement proper logging in your application
- Add health checks (readiness and liveness probes)
- Set appropriate resource requests and limits
- Test your container images locally before deployment
Here's an example of a good deployment configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: quasar-api-staging
spec:
replicas: 1
selector:
matchLabels:
app: quasar-api
template:
metadata:
labels:
app: quasar-api
spec:
containers:
- name: quasar-api-staging
image: us.gcr.io/skywatch-app/quasar-api-staging:15.0
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
When your Kubernetes pod shows CrashLoopBackOff
status, it means the container starts but crashes repeatedly, triggering Kubernetes' backoff timer. The pod you described (quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4
) exhibits classic symptoms:
NAME READY STATUS RESTARTS AGE
quasar-api-staging-...-3r7v4 0/1 CrashLoopBackOff 72 5h
First, gather detailed information about the failing pod:
# Get pod events
kubectl describe pod quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4
# Check container logs (even if they appear empty)
kubectl logs quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4 --previous
# Get pod configuration
kubectl get pod quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4 -o yaml
Based on your pod events, we can identify several potential issues:
- Application crashes immediately after startup
- Missing environment variables or configuration
- Resource constraints (CPU/memory limits too low)
- Dependency services unavailable
When regular logs don't show the error, try these approaches:
# Run a debug container in the pod's namespace
kubectl debug -it quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4 --image=busybox -- sh
# Check mounted volumes
kubectl exec quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4 -- ls /path/to/mount
# Test connectivity to dependencies
kubectl exec quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4 -- curl http://dependency-service
Here's an example deployment configuration that includes proper liveness probes and resource limits:
apiVersion: apps/v1
kind: Deployment
metadata:
name: quasar-api-staging
spec:
replicas: 1
selector:
matchLabels:
app: quasar-api-staging
template:
metadata:
labels:
app: quasar-api-staging
spec:
containers:
- name: quasar-api-staging
image: us.gcr.io/skywatch-app/quasar-api-staging:15.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
resources:
limits:
memory: "512Mi"
cpu: "500m"
requests:
memory: "256Mi"
cpu: "250m"
Since your application runs locally but fails in the cluster, consider these differences:
- Environment variables (use
kubectl set env
to verify) - Network policies and service meshes
- Volume mounts and permissions
- Cluster-specific configurations
To compare environments, run:
# Get all environment variables
kubectl exec quasar-api-staging-14c385ccaff2519688add0c2cb0144b2-3r7v4 -- env