Since Kubernetes 1.8, the platform enforces swap disabling by default through the --fail-swap-on
flag. This requirement often puzzles administrators who come from traditional Linux administration backgrounds where swap is considered a safety net.
Kubernetes operates on several fundamental principles that conflict with swap usage:
- Predictable Scheduling: The scheduler assumes memory limits are absolute guarantees
- Quality of Service (QoS): Memory-starved pods could thrash when swap is enabled
- Performance Degradation: Disk-backed memory causes unpredictable latency
Consider this scenario where swap causes issues:
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
spec:
containers:
- name: memory-demo-ctr
image: polinux/stress
resources:
limits:
memory: "200Mi"
requests:
memory: "100Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]
With swap enabled, this pod might run but experience severe performance degradation as memory pages shuffle between RAM and disk.
For those needing memory flexibility:
# Temporary workaround (not recommended for production)
kubelet --fail-swap-on=false
# Better solution: Configure proper memory limits
kubectl set resources deploy/myapp --limits=memory=512Mi
Exception cases include:
- Development clusters with limited resources
- Legacy applications with unpredictable memory patterns
- Edge/IoT deployments with extreme resource constraints
Remember that any swap-enabled configuration should include proper monitoring:
# Sample Prometheus alert for swap usage
- alert: HighSwapUsage
expr: (1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100 > 80
for: 10m
labels:
severity: warning
Since Kubernetes 1.8, the scheduler's behavior around swap memory became explicitly strict. The --fail-swap-on
flag defaults to true
, forcing administrators to either disable swap entirely or consciously override this safety measure. This design choice stems from fundamental architectural decisions in Kubernetes' resource management model.
Three primary technical factors drive this requirement:
1. Predictable Scheduling: Kubernetes scheduler relies on precise memory calculations
2. Quality of Service (QoS) Guarantees: Swap interferes with pod priority enforcement
3. Performance Isolation: Swapping introduces non-deterministic latency
Consider this node memory scenario:
apiVersion: v1
kind: Pod
metadata:
name: memory-hog
spec:
containers:
- name: stress
image: polinux/stress
resources:
requests:
memory: "1Gi"
limits:
memory: "1Gi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "1500M"]
With swap enabled, this pod might appear to run successfully while actually suffering severe performance degradation due to swapping, misleading both users and monitoring systems.
For environments where swap cannot be disabled (like development laptops), Kubernetes provides the escape hatch:
# For kubelet configuration
KUBELET_EXTRA_ARGS="--fail-swap-on=false"
However, this comes with critical caveats:
- Pod memory limits become approximations
- No guarantee of QoS class enforcement
- Potential for resource starvation attacks
Instead of using swap, consider these Kubernetes-native solutions:
# Vertical Pod Autoscaler example
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
Configure Prometheus to track real memory pressure:
# prometheus-rules.yaml
- alert: HighMemoryPressure
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1
for: 5m
labels:
severity: critical