When investigating unexpected pod restarts in Kubernetes, start by examining the pod's lifecycle events:
kubectl describe pod [pod-name] -n [namespace]
Look for sections like:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned default/nginx to gke-cluster
Warning Unhealthy 2m (x3 over 4m) kubelet Liveness probe failed: HTTP probe failed
Normal Killing 2m kubelet Container nginx failed liveness probe, will be restarted
Based on your single-node GKE setup, these are likely culprits:
- Resource constraints - Check if your pod is hitting memory or CPU limits
- Failed health checks - Review your liveness/readiness probe configurations
- Node pressure - Even in single-node clusters, system components can evict pods
- Application crashes - Check application logs for uncaught exceptions
Create a Prometheus alert rule to detect frequent restarts:
groups:
- name: pod-restart-alerts
rules:
- alert: FrequentPodRestarts
expr: rate(kube_pod_container_status_restarts_total[5m]) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is restarting frequently"
description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} has restarted {{ $value }} times in the last 5 minutes"
Combine multiple approaches for effective troubleshooting:
# Get previous container logs if pod crashed
kubectl logs [pod-name] --previous
# Check resource usage history
kubectl top pod [pod-name] --containers
# View OOM killer events (if memory related)
kubectl get events --field-selector=reason=OOMKilling
For your personal website deployment, add these safeguards:
apiVersion: apps/v1
kind: Deployment
metadata:
name: website
spec:
replicas: 2 # Basic HA
template:
spec:
containers:
- name: web
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 10
For immediate alerts, create a Kubernetes Event Exporter with Slack integration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: event-exporter
spec:
template:
spec:
containers:
- name: event-exporter
image: ghcr.io/resmo/kubernetes-event-exporter:latest
env:
- name: SLACK_WEBHOOK_URL
value: "https://hooks.slack.com/services/..."
args:
- --config=/etc/config.yaml
volumeMounts:
- name: config-volume
mountPath: /etc/config.yaml
subPath: config.yaml
volumes:
- name: config-volume
configMap:
name: event-exporter-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: event-exporter-config
data:
config.yaml: |
logLevel: debug
routes:
- match:
- reason: "Started"
- reason: "Killing"
- reason: "BackOff"
- reason: "Unhealthy"
sink: slack
sinks:
- name: slack
slack:
webhookurl: ${SLACK_WEBHOOK_URL}
message: "Event: {reason}\nPod: {involvedObject.name}\nNamespace: {involvedObject.namespace}\nMessage: {message}"
When running applications in Kubernetes, unexpected container restarts can occur for various reasons. The key is to understand the root cause and implement proper monitoring. Here's how to investigate:
First, examine your pod's status and restart count:
kubectl get pods --all-namespaces
kubectl describe pod [POD_NAME]
Look for the Restart Count
field and Last State
in the container status section.
Examine both current and previous container logs:
kubectl logs [POD_NAME] --previous
kubectl logs [POD_NAME] --tail=50
- OOMKilled (Out of Memory)
- CrashLoopBackOff
- Liveness probe failures
- Node resource pressure
- Manual pod eviction
Create Prometheus alerts for container restarts:
groups:
- name: container-restarts
rules:
- alert: HighContainerRestarts
expr: rate(kube_pod_container_status_restarts_total[5m]) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} in pod {{ $labels.pod }} is restarting frequently"
description: "Container {{ $labels.container }} in pod {{ $labels.pod }} has restarted {{ $value }} times in the last 5 minutes"
For OOM issues, check container memory limits:
kubectl get pod [POD_NAME] -o json | jq '.spec.containers[].resources'
For liveness probe failures:
kubectl describe pod [POD_NAME] | grep -A 10 "Liveness"
Enable Kubernetes events monitoring:
kubectl get events --sort-by='.metadata.creationTimestamp'
kubectl get events --field-selector involvedObject.kind=Pod
For deeper investigation, check kubelet logs on the node:
journalctl -u kubelet --no-pager -n 100
- Implement proper resource requests/limits
- Configure appropriate liveness/readiness probes
- Set up pod disruption budgets for critical workloads
- Monitor node resource utilization
Create a simple script to monitor and notify about restarts:
#!/bin/bash
while true; do
kubectl get pods -o json | jq -r '.items[] | select(.status.containerStatuses[].restartCount > 0) | .metadata.name' | while read pod; do
echo "ALERT: $pod has restarted"
# Add your notification logic here (email, Slack, etc.)
done
sleep 60
done