When debugging Docker container initialization issues, especially with low-level system modifications, we often encounter puzzling symptoms. In this case, basic commands like ls
work while their variants (ls -l
) fail silently. Here's how to systematically approach such problems:
The root cause lies in the LD_LIBRARY_PATH modification affecting how glibc resolves hostnames. The workaround attempts to override /etc/hosts
through these steps:
RUN mkdir -p -- /lib-override /etc-override && cp /lib/libnss_files.so.2 /lib-override
ADD hosts.template /etc-override/hosts
RUN perl -pi -e 's:/etc/hosts:/etc-override/hosts:g' /lib-override/libnss_files.so.2
ENV LD_LIBRARY_PATH /lib-override
1. Enable Docker Daemon Debug Logging
Start the Docker daemon with debug mode:
dockerd --debug
Or edit /etc/docker/daemon.json
:
{
"debug": true,
"log-level": "debug"
}
2. Using strace for System Call Tracing
For containers that fail silently, strace becomes invaluable:
docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it your_image strace ls -l
3. Alternative Approach for libnss_files Debugging
The cleaner solution avoids LD_LIBRARY_PATH manipulation:
# Instead of modifying libnss_files.so.2
RUN echo "user_allow_other" >> /etc/fuse.conf
RUN apt-get install -y fuse-overlayfs
RUN mkdir -p /mnt/hosts && fuse-overlayfs -o allow_other /mnt/hosts /etc/hosts=./custom_hosts
The key difference lies in how these commands resolve hostnames:
# ls -l triggers NSS lookups for:
# - User/group resolution (for ownership display)
# - Potential network operations (if coloring is enabled)
For Kubernetes environments:
apiVersion: v1
kind: Pod
metadata:
name: host-aliases
spec:
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "foo.local"
- "bar.local"
For Docker Compose:
services:
app:
extra_hosts:
- "somehost:162.242.195.82"
- "otherhost:50.31.209.229"
To confirm your container's NSS configuration:
docker run --rm your_image ldd $(which ls)
docker run --rm your_image getent hosts
docker run --rm your_image strace -e openat ls -l 2>&1 | grep hosts
When a Docker container builds successfully but fails during initialization, it's often related to runtime configuration or environment variables. The specific case here involves a custom /etc/hosts
modification through library overrides, which works for simple commands like ls
but fails for ls -l
.
Here are several ways to investigate such issues:
# 1. Check container logs (even if they appear empty)
docker logs --details CONTAINER_ID
# 2. Run with debug mode enabled
docker run --env "DOCKER_VERBOSE=1" your_image
# 3. Force interactive mode with shell fallback
docker run -it --entrypoint=/bin/sh your_image -c "ls -l || echo 'Failed with status $?'"
When standard methods don't reveal the issue:
# 4. Use strace to trace system calls
docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
your_image strace ls -l
# 5. Inspect the container's filesystem
docker run --rm -it --entrypoint=/bin/sh your_image
# Inside container:
mount | grep overlay
cat /proc/mounts
The original approach modifies libnss_files.so.2
to use custom hosts files. While creative, this can cause subtle issues:
# Better alternative: Use --add-host at runtime
docker run --add-host custom.host:127.0.0.1 your_image
# Or in docker-compose:
extra_hosts:
- "custom.host:127.0.0.1"
Create a debug-friendly image variant:
FROM your_base_image
RUN apt-get update && apt-get install -y \
strace lsof procps
ENTRYPOINT ["/bin/bash", "-c"]
CMD ["strace -f -o /tmp/debug.log your_command"]
Implement proper logging in your application:
# Python example
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s %(levelname)s %(message)s',
handlers=[logging.StreamHandler()]
)
# Bash example
exec > >(tee /var/log/startup.log) 2>&1