Mounting Private /proc in Docker Namespaces: Minimal Privilege Requirements and Solutions


2 views

When working with nested namespaces inside Docker containers, mounting a private /proc filesystem presents unique security and privilege challenges. The core issue stems from Linux kernel security mechanisms interacting with container isolation layers.

As demonstrated in the examples, several approaches yield different results:

# Full privileged access works
sudo docker run --privileged --security-opt=seccomp=unconfined \\
 -it fedora:rawhide /usr/bin/unshare -Ufmp -r \\
 /bin/sh -c 'mount -t proc proc /proc'

# CAP_SYS_ADMIN alone fails
sudo docker run --cap-add=sys_admin --security-opt=seccomp=unconfined \\
  -it fedora:rawhide /usr/bin/unshare -Ufmp -r \\
   /bin/sh -c 'mount -t proc proc /proc'

The Security-Enhanced Linux (SELinux) policy enforcement plays a significant role in this scenario:

# Disabling SELinux enforcement per-container
sudo docker run --cap-add=sys_admin --security-opt label:disable \\
 -it fedora:rawhide /usr/bin/unshare -fmp /bin/sh -c \\
 'mount --make-private / ; mount -t proc proc /proc'

For the most secure implementation that still works, consider:

# Minimal working configuration
sudo docker run --cap-add=SYS_ADMIN --cap-add=SYS_CHROOT \\
    --security-opt seccomp=unconfined \\
    --security-opt apparmor=unconfined \\
    --security-opt label=disable \\
    -it fedora:rawhide /usr/bin/unshare -fmp \\
    /bin/sh -c 'mount --make-private /; mount -t proc proc /proc'

The EPERM (Operation not permitted) error occurs because:

  1. The container needs CAP_SYS_ADMIN for mount operations
  2. The user namespace remapping conflicts with SELinux labeling
  3. Seccomp filters may block certain syscalls needed for nested mounts

For deeper investigation, consider using kernel tracing:

# Trace mount syscalls
sudo strace -f -e trace=mount \\
    docker run --cap-add=sys_admin --security-opt label:disable \\
    -it fedora:rawhide /usr/bin/unshare -fmp /bin/sh -c \\
    'mount -t proc proc /proc'

If the above solutions don't meet your needs, consider:

# Using podman with --privileged instead of docker
podman run --privileged -it fedora:rawhide \\
    /usr/bin/unshare -Ufmp -r /bin/sh -c 'mount -t proc proc /proc'

# Creating the namespace outside Docker first
unshare -Ufmp -r
docker run --pid=host --net=host --ipc=host \\
    -v /proc:/oldproc fedora:rawhide /bin/sh -c \\
    'mount -t proc proc /proc'

Creating nested namespaces within Docker containers presents unique challenges, particularly when dealing with filesystem mounts. The specific case of mounting a private /proc requires careful consideration of Linux capabilities and security mechanisms.

The working solution using --privileged flag:

$ sudo docker run --privileged --security-opt=seccomp=unconfined \
 -it fedora:rawhide /usr/bin/unshare -Ufmp -r \
 /bin/sh -c 'mount -t proc proc /proc'

The non-working approach with CAP_SYS_ADMIN:

$ sudo docker run --cap-add=sys_admin --security-opt=seccomp=unconfined \
  -it fedora:rawhide /usr/bin/unshare -Ufmp -r \
   /bin/sh -c 'mount -t proc proc /proc'
mount: /proc: cannot mount proc read-only.

Disabling SELinux enforcement globally works, but a more targeted approach is available:

sudo docker run --cap-add=sys_admin --security-opt label:disable \
 -it fedora:rawhide /usr/bin/unshare -fmp /bin/sh -c \
 'mount --make-private / ; mount -t proc proc /proc'

The failure with -U and -r flags in unshare suggests deeper permission requirements. These flags request:

  • -U: Create new user namespace
  • -r: Map current user to root in new namespace

The most secure solution that works consistently:

$ sudo docker run --cap-add=SYS_ADMIN --cap-add=SYS_CHROOT \
 --security-opt seccomp=unconfined --security-opt label:disable \
 -it fedora:rawhide /usr/bin/unshare --mount-proc -fpUr \
 /bin/sh -c 'echo "Private proc mounted at $(readlink -f /proc/self)"'
  • Requires both CAP_SYS_ADMIN and CAP_SYS_CHROOT
  • Needs SELinux labeling disabled
  • unshare --mount-proc handles the mount automatically
  • Seccomp profile must be unconfined

For systems using Podman instead of Docker:

$ podman run --cap-add SYS_ADMIN --security-opt label=disable \
 -it fedora:rawhide /usr/bin/unshare -Ufmp -r \
 /bin/sh -c 'mount -t proc proc /proc'

When facing permission issues, these commands can help diagnose:

# Check effective capabilities
cat /proc/self/status | grep CapEff

# Check SELinux context
ls -Z /proc

# Kernel audit logs
ausearch -m avc -ts recent