Debugging Kubernetes Pods Stuck in ContainerCreating: Log Collection and Troubleshooting Guide


4 views

When a pod gets stuck in ContainerCreating state, traditional debugging with kubectl logs won't work because Kubernetes requires the container to be running before exposing logs. Here's how to inspect the issue:

# This won't work for pending pods
kubectl logs pod-name
# Returns: Error from server (BadRequest): container "container-name" in pod "pod-name" is waiting to start: ContainerCreating

1. Check pod events:

kubectl describe pod pod-name
# Look for Events section which shows:
#   - Image pull attempts
#   - Volume mounting issues
#   - Resource constraints

2. Examine cluster-wide events:

kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'

Image Pull Problems:

# Example error in describe output:
Events:
  Warning  Failed     12s (x3 over 42s)  kubelet            Failed to pull image "private-repo/image:v1":
  rpc error: code = Unknown desc = failed to pull and unpack image "private-repo/image:v1":
  failed to resolve reference "private-repo/image:v1": pull access denied

Persistent Volume Issues:

# Typical error pattern:
Events:
  Warning  FailedMount  3m2s (x8 over 8m12s)  kubelet  MountVolume.SetUp failed for volume "pvc-123" :
  timeout expired waiting for volumes to attach or mount for pod "default"/"pod-name".

For deeper inspection of kubelet operations:

# Check kubelet logs (requires node access)
journalctl -u kubelet -n 100 --no-pager

# Verify container runtime status
crictl ps -a | grep pod-name
crictl logs container-id
  1. Check kubectl describe pod for immediate errors
  2. Verify image accessibility with kubectl get events
  3. Inspect persistent volume claims status
  4. Check node resource allocation
  5. Review network policies affecting the pod
# Start with pod inspection
kubectl get pod web-app-5dfd6f7d4-abc12 -o wide

# Check detailed status
kubectl describe pod web-app-5dfd6f7d4-abc12 | grep -A 20 Events

# Verify image pull secrets
kubectl get secret regcred -o yaml

# Check persistent volume status
kubectl get pvc
kubectl describe pvc web-app-storage

When a Kubernetes pod gets stuck in the "ContainerCreating" state, it typically indicates an issue during the container initialization phase. Unlike running containers where kubectl logs works, you can't access logs directly from a pending container. Here's how to approach this systematically.

The first diagnostic step is examining pod events:

kubectl describe pod <pod-name> -n <namespace>

Look for warning messages in the "Events" section. Common issues include:

  • Image pull failures ("ErrImagePull")
  • Insufficient resources ("Insufficient cpu/memory")
  • Volume mounting problems ("Unable to attach volume")

When pod events aren't sufficient, check the kubelet logs on the node where the pod is scheduled:

# Find the node hosting your pod
kubectl get pod <pod-name> -n <namespace> -o wide

# SSH into the node and check kubelet logs
journalctl -u kubelet --since "1 hour ago" | grep -i "error\|fail\|warning"

1. Image Pull Issues

If you see "ImagePullBackOff" or "ErrImagePull":

# Verify image exists and credentials are correct
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'

# For private registries, ensure imagePullSecrets are configured:
apiVersion: v1
kind: Pod
metadata:
  name: private-reg-pod
spec:
  containers:
  - name: private-reg-container
    image: private.registry.com/image:tag
  imagePullSecrets:
  - name: regcred

2. Persistent Volume Claims

Storage-related hangs often occur when:

# Check PVC status
kubectl get pvc -n <namespace>

# Verify StorageClass exists
kubectl get storageclass

# Example of a working PVC definition:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mypvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: standard

3. Resource Constraints

If the node lacks resources:

# Check node capacity vs requests
kubectl describe node <node-name>

# Adjust resource requests in your deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:1.0
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"

For complex cases, consider:

# Enable verbose kubelet logging (add these flags to kubelet)
--v=4
--feature-gates=DebugContainers=true

# Use ephemeral debug containers (Kubernetes 1.18+)
kubectl debug -it <pod-name> --image=busybox --target=<container-name>
  • Implement readiness/liveness probes to catch initialization failures early
  • Set reasonable resource requests and limits
  • Use pod disruption budgets for critical workloads
  • Monitor pod startup times with Prometheus metrics