Recently while managing a 3-node Docker Swarm cluster, I encountered this bizarre situation where a network appears in docker network ls
but cannot be removed through normal means. The exact behavior was:
# On the affected node:
docker network ls
NETWORK ID NAME DRIVER SCOPE
xyz123 problematic_net overlay swarm
docker network rm xyz123
Error response from daemon: network xyz123 not found
docker network inspect xyz123
[
{
"Name": "problematic_net",
"Id": "xyz123",
"Scope": "swarm",
"Driver": "overlay",
...
}
]
After digging into Docker's source code and consulting their issue tracker, this typically happens when:
- The network exists in the Swarm manager's Raft store but not in the local node's network namespace
- There was an unclean shutdown during network creation/removal
- The network's state got corrupted in the Swarm control plane
Here are the effective solutions I've collected from production experience:
Method 1: Direct Swarm API Manipulation
Access the Swarm manager's Raft store directly:
# First identify the Swarm manager node
docker node ls --filter role=manager
# SSH into the manager and stop Docker temporarily
sudo systemctl stop docker
# Locate the swarm/raft.db file (typically in /var/lib/docker/swarm)
sudo find /var/lib/docker -name raft.db
# Make a backup before proceeding!
sudo cp /var/lib/docker/swarm/raft.db /backup/raft.db.bak
# Use sqlite3 to inspect (install if needed)
sudo apt-get install sqlite3
sudo sqlite3 /var/lib/docker/swarm/raft.db
# In sqlite:
.tables
SELECT * FROM Networks WHERE name LIKE '%problematic%';
DELETE FROM Networks WHERE id='xyz123';
.exit
# Restart Docker
sudo systemctl start docker
Method 2: Nuclear Option - Clean Swarm
If you can afford to recreate the Swarm:
# On all manager nodes:
docker swarm leave --force
# Then reinitialize
docker swarm init
To avoid this situation:
- Always use
docker network rm
on manager nodes - Implement proper shutdown procedures (no SIGKILL to dockerd)
- Monitor network states with
docker network inspect
For those uncomfortable with direct database edits:
# Using Docker's HTTP API directly
curl -X DELETE --unix-socket /var/run/docker.sock \
"http:/v1.41/networks/xyz123"
# Or using Python docker SDK
import docker
client = docker.DockerClient(base_url='unix://var/run/docker.sock')
try:
client.networks.get('xyz123').remove()
except docker.errors.NotFound:
print("Already gone")
In Docker Swarm environments, you might encounter a particularly frustrating situation where:
docker network ls
shows the network existsdocker network inspect
returns valid network configuration- But
docker network rm
claims the network doesn't exist
This typically occurs when there's inconsistency between the swarm manager's network state and individual node states. Common triggers include:
- Partial network creation during swarm initialization
- Network removal interrupted by node failures
- Raft log synchronization issues between managers
1. Verify Network State Across All Nodes
# Run on every node in the swarm
for node in $(docker node ls -q); do
echo "=== Node: $(docker node inspect $node --format '{{.Description.Hostname}}') ==="
docker node inspect $node | jq '.[].Status.State'
docker exec $(docker inspect $node -f '{{.Status.Addr}}') docker network ls | grep [network-id]
done
2. Force Removal from Swarm Manager
# First try standard removal with --force
docker network rm [network-id] --force
# If that fails, use direct etcd manipulation (for older Docker versions)
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
alpine sh -c "apk add etcdctl && \
etcdctl --endpoints unix:///var/run/docker.sock rm /docker/network/v1.0/network/[network-id]"
3. Clean Up Residual Configs
For persistent cases, manually remove network files:
# On each affected node
sudo rm -rf /var/lib/docker/network/files/local-kv.db
sudo systemctl restart docker
- Always use
docker network create --driver overlay --attachable
for swarm networks - Implement proper swarm backup procedures for the raft logs
- Monitor network synchronization with
docker network inspect --format '{{json .Peers}}' [network-id]
As last resort, rebuild the swarm after backing up services:
# On manager node
docker swarm leave --force
sudo rm -rf /var/lib/docker/swarm
docker swarm init
# Then rejoin all worker nodes