How to Force Remove a “Ghost” Docker Swarm Network When docker network rm Fails


2 views

Recently while managing a 3-node Docker Swarm cluster, I encountered this bizarre situation where a network appears in docker network ls but cannot be removed through normal means. The exact behavior was:


# On the affected node:
docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
xyz123         problematic_net   overlay   swarm

docker network rm xyz123
Error response from daemon: network xyz123 not found

docker network inspect xyz123
[
    {
        "Name": "problematic_net",
        "Id": "xyz123",
        "Scope": "swarm",
        "Driver": "overlay",
        ...
    }
]

After digging into Docker's source code and consulting their issue tracker, this typically happens when:

  • The network exists in the Swarm manager's Raft store but not in the local node's network namespace
  • There was an unclean shutdown during network creation/removal
  • The network's state got corrupted in the Swarm control plane

Here are the effective solutions I've collected from production experience:

Method 1: Direct Swarm API Manipulation

Access the Swarm manager's Raft store directly:


# First identify the Swarm manager node
docker node ls --filter role=manager

# SSH into the manager and stop Docker temporarily
sudo systemctl stop docker

# Locate the swarm/raft.db file (typically in /var/lib/docker/swarm)
sudo find /var/lib/docker -name raft.db

# Make a backup before proceeding!
sudo cp /var/lib/docker/swarm/raft.db /backup/raft.db.bak

# Use sqlite3 to inspect (install if needed)
sudo apt-get install sqlite3
sudo sqlite3 /var/lib/docker/swarm/raft.db

# In sqlite:
.tables
SELECT * FROM Networks WHERE name LIKE '%problematic%';
DELETE FROM Networks WHERE id='xyz123';
.exit

# Restart Docker
sudo systemctl start docker

Method 2: Nuclear Option - Clean Swarm

If you can afford to recreate the Swarm:


# On all manager nodes:
docker swarm leave --force

# Then reinitialize
docker swarm init

To avoid this situation:

  • Always use docker network rm on manager nodes
  • Implement proper shutdown procedures (no SIGKILL to dockerd)
  • Monitor network states with docker network inspect

For those uncomfortable with direct database edits:


# Using Docker's HTTP API directly
curl -X DELETE --unix-socket /var/run/docker.sock \
  "http:/v1.41/networks/xyz123"

# Or using Python docker SDK
import docker
client = docker.DockerClient(base_url='unix://var/run/docker.sock')
try:
    client.networks.get('xyz123').remove()
except docker.errors.NotFound:
    print("Already gone")

In Docker Swarm environments, you might encounter a particularly frustrating situation where:

  • docker network ls shows the network exists
  • docker network inspect returns valid network configuration
  • But docker network rm claims the network doesn't exist

This typically occurs when there's inconsistency between the swarm manager's network state and individual node states. Common triggers include:

  • Partial network creation during swarm initialization
  • Network removal interrupted by node failures
  • Raft log synchronization issues between managers

1. Verify Network State Across All Nodes

# Run on every node in the swarm
for node in $(docker node ls -q); do
  echo "=== Node: $(docker node inspect $node --format '{{.Description.Hostname}}') ==="
  docker node inspect $node | jq '.[].Status.State'
  docker exec $(docker inspect $node -f '{{.Status.Addr}}') docker network ls | grep [network-id]
done

2. Force Removal from Swarm Manager

# First try standard removal with --force
docker network rm [network-id] --force

# If that fails, use direct etcd manipulation (for older Docker versions)
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
  alpine sh -c "apk add etcdctl && \
  etcdctl --endpoints unix:///var/run/docker.sock rm /docker/network/v1.0/network/[network-id]"

3. Clean Up Residual Configs

For persistent cases, manually remove network files:

# On each affected node
sudo rm -rf /var/lib/docker/network/files/local-kv.db
sudo systemctl restart docker
  • Always use docker network create --driver overlay --attachable for swarm networks
  • Implement proper swarm backup procedures for the raft logs
  • Monitor network synchronization with docker network inspect --format '{{json .Peers}}' [network-id]

As last resort, rebuild the swarm after backing up services:

# On manager node
docker swarm leave --force
sudo rm -rf /var/lib/docker/swarm
docker swarm init

# Then rejoin all worker nodes